= 2
dim = DataLoaders.from_dsets([], []) # Empty train and validation datasets
dls = XResAttn(dim, n_class=1, stem_szs=(64,), conv_blocks=[1, 1, 1],
model =[128, 256, 512], pos_enc=False,
block_szs=4, dim_ff=512, nhead_enc=8,
n_encoder_layers=[], norm=False, yrange=(-3.1, 3.1))
linear_layers
model.to(default_device())= Learner(dls, model, loss_func=L1LossFlat(), model_dir=MODEL_PATH) learn_diff
Benchmark: Brownian motion
The main goal is to characterize heterogeneous diffusion processes without any kind of prior konwledge. To do so, we predict the diffusion coefficient at every time step, which naturally highlights potential changes in the behaviour along the trajectories. This allows us to obtain a deeper understanding of the underlying physical systems that drive the dynamics.
In the following analysis, we provide a thorough characterization of the model performance under various conditions and we show how to reproduce some figures of our paper.
Load the model
First of all, we need to load a trained model with which to perform the analysis.
We refer to the model training tutorial for details about how to train and save your models.
To load the models within a Leaner
, we need to provide some data loaders. However, we will use the model to study various scenarios, so we provide an empty dataloader and load the different data sets as we need them.
Now we can load the trained model weights.
f'xresattn_bm_{dim}d_1_to_4cp')
learn_diff.load(eval(); learn_diff.model.
It is very important to define the model exactly with the same layers as we did during training. Otherwise, we won’t be able to load the trained parameters!
Pointwise prediction
We study the overall performance of the method over heterogeneous Brownian motion trajectories. We look at the pointwise prediction error and study how the length of the segments within the trajectory affect it.
We compare our approach with a reference method based on the time-averaged mean squared displacement (TA-MSD) (see baselines). However, since this method cannot account for any heterogeneity in the trajectories, we provide it with the pre-segmented trajectories. This results in a massive advantage and, thus, we also pass the pre-cut segments through our model.
With this segment-wise analysis, we can easily study the performance of all the methods as function of the segment lengths.
We consider diffusion coefficients over six orders of magnitude \(D\in[10^{-3},10^3]\). Some times, our observations may fall outside of this range and our predictions saturate to one of the interval limits. In these cases, we can rescale the trajectories to fit within the predicted range, as we do in the experimental data analysis tutorials.
Generate the data
To evaluate the different methods, we need a proper test set. We can generate one in the same way that we genereate the train set in the model training tutorial.
Skip the data generation if you already have a test set!
# OPTIONAL: create the test set.
# Don't need to run this cell if it already exists.
= 12000, 200, 2
n_per_set, max_t, dim = np.logspace(-3, 3, 1000)
Ds = [1, 2, 3, 4]
cps = partial(create_bm_segmentation_dataset,
ds_fun =max_t, dim=dim, Ds=Ds, save=False)
max_t= [ds_fun(n_per_set, n_change_points=n_cp) for n_cp in cps]
datasets = combine_datasets(datasets)
ds = f"{min(cps)}_to_{max(cps)}"
n_change = DATA_PATH/get_bmds_fname(n_change, max_t, dim, 'test')
save_path ds.to_pickle(save_path)
Load the test set.
# Skip this cell if you just generated it.
= "1_to_4"
n_change = "test"
name = load_dataset(n_change=n_change, dim=dim, bm=True, name=name) ds
We also load the data as data loaders with split_pct=1
(all data for validation). This allows us to loop over batches much more easily setting shuffle=False
.
= 200
bs = get_segmentation_dls(target='y_exp', dim=dim, n_change=n_change,
dls =name, tfm_y=torch.log10, bm=True, bs=bs,
name=False, split_pct=1.)
shuffle= default_device() dls.device
Get the predictions
We now perform the predictions of the trajectories as a whole and store them in the dataframe to process them later.
for i, (x, y) in tqdm(enumerate(dls.valid)):
= to_detach(learn_diff.model(x).squeeze())
pred = to_detach((learn_diff.model(x).squeeze()-y).abs().mean(-1))
mae = np.array([p for p in pred], dtype=object)
l_pred *bs:(i+1)*bs-1, 'mae'] = mae.numpy()
ds.loc[i*bs:(i+1)*bs-1, 'pred'] = l_pred ds.loc[i
With the full-trajectory predictions, we can proceed to perform the segment-wise analysis. We split the trajectories and their predictions by the true changepoints. Then, we perform the prediction especifically for each segment with both the model and the TA-MSD method. We choose the TA-MSD 1-2 to perform the evaluation, which is optimal for Brownian motion.
Let’s define a prediction function to simplify the code.
def predict_sample(model, x):
= x.unsqueeze(0).to(default_device())
xb return to_detach(model(xb).squeeze())
Now we can proceed to process the trajectories. The following piece of code is rather chonky so let us give a brief overview. For every trajectory (outer loop), we process each of its segments (inner loop). For every segment, we compute the mean absolute error (MAE) and the mean relative error with the three aforementioned approaches.
= []
segment_data
for i, row in tqdm(ds.iterrows()):
= row.x, torch.log10(row.y_exp.squeeze())
x, y,= row.pred, row.cp
pred, cps = split_tensor(x.T, cps)
split_x = split_tensor(y, cps), split_tensor(pred, cps)
split_y, split_pred
= zip(split_x, split_y, split_pred)
splits for j, (seg_x, seg_y, pred_cut) in enumerate(splits):
# Prediction over full trajectory cut with true changepoints
= mean_absolute_error(pred_cut, seg_y)
mae = mean_relative_error(pred_cut, seg_y)
rel_err
# Prediction over segment
= predict_sample(learn_diff.model, seg_x - seg_x[0])
pred_segment = mean_absolute_error(pred_segment, seg_y)
mae_segment = mean_relative_error(pred_segment, seg_y)
rel_err_segment
# Prediction over segment with TA-MSD
= diffusion_coefficient_tamsd(seg_x)
pred_tamsd = mean_absolute_error(pred_tamsd, 10**seg_y[0])
mae_tamsd = (pred_tamsd*10**(-seg_y[0]) - 1).abs()
rel_err_tamsd
# Save the segment metrics
'sample': i, 'segment_idx': j,
segment_data.append({'length': len(seg_y), 'x': seg_x, 'y': seg_y,
'pred_cut': pred_cut,
'pred_segment': pred_segment,
'pred_tamsd': pred_tamsd, 'mae': mae,
'rel_err': rel_err, 'mae_segment': mae_segment,
'rel_err_segment': rel_err_segment,
'mae_tamsd': mae_tamsd,
'rel_err_tamsd': rel_err_tamsd})
= pd.DataFrame.from_records(segment_data) segment_ds
Finally, we save all the data for its posterior post-processing. It is extremely heavy and slow :D
/"segment_analysis_test.pkl") segment_ds.to_pickle(DATA_PATH
Overall performance
To obtain an intuition of our model’s performance, we can do a qualitative analysis by looking at the predicted diffusion coefficient \(D_{\text{pred}}\) as function of the ground truth \(D_{\text{true}}\) at every time step.
To do so, we simply concatenate the predictions for every segment segment_ds.pred_cut
and their true labels segment_ds.y
.
= np.concatenate([*segment_ds.y])
true = np.concatenate([*segment_ds.pred_cut]) pred
Now we can build the 2D histogram.
= [np.linspace(-3, 3, 61), np.linspace(-3.1, 3.1, 63)]
bins = np.histogram2d(true, pred, bins=bins) hist, true_edges, pred_edges
Code
= plt.figure(figsize=(1.1*fig_size, fig_size))
fig /hist.max(), cmap=cmap_hist1, vmax=0.8, rasterized=True)
plt.pcolor(hist.transpose()= np.linspace(0, len(true_edges) - 1, np.ceil(len(true_edges)/30).astype(int))
xtick_pos = [fr'$10^{{{i:.0f}}}$' for i in true_edges[::30]]
xtick_labels = np.linspace(0, len(pred_edges) - 3, np.ceil(len(pred_edges)/30).astype(int)) + 1
ytick_pos = [fr'$10^{{{i:.0f}}}$' for i in pred_edges[1::30]]
ytick_labels =ytick_labels)
plt.yticks(ytick_pos, labelsr'$D_{pred}$', fontsize=16)
plt.ylabel(
plt.xticks(xtick_pos, xtick_labels)r'$D_{true}$', fontsize=16)
plt.xlabel(=14) plt.tick_params(labelsize
The histogram shape is close to a perfect regressor, where we would obtain a straight diagonal line. Therefore, the model seems to be working pretty nicely!
Prediction error
To obtain a quantitative analysis, we can look at the relative error of the different methods.
= segment_ds.length.unique().astype(int)
lengths
lengths.sort()
= ['err', 'err_segment', 'err_tamsd']
metrics = {m: {'mean': [], 'sem': [], 'global': None}
metric_by_length for m in metrics}
for m in metrics:
= [getattr(segment_ds, f"rel_{m}")[segment_ds.length == l].mean()
means for l in lengths]
= [getattr(segment_ds, f"rel_{m}")[segment_ds.length == l].sem()
sems for l in lengths]
'mean'] = np.array(means)
metric_by_length[m]['sem'] = np.array(sems)
metric_by_length[m]['global'] = (
metric_by_length[m][getattr(segment_ds, f"rel_{m}")*segment_ds.length).sum() /
(sum()
segment_ds.length. )
We save all these metrics to plot them nicely later.
= "relative_error_segment_length"
figure_name = (FIG_PATH/figure_name).with_suffix(".pkl")
plot_data_path
with open(plot_data_path, 'wb') as f:
=pickle.HIGHEST_PROTOCOL) pickle.dump(metric_by_length, f, protocol
Let’s see the overall relative error for each method.
Code
print(f"STEP: {metric_by_length_D['err']['global']:.3f}")
print(f"STEP + segments: {metric_by_length_D['err_segment']['global']:.3f}")
print(f"TA-MSD + segments: {metric_by_length_D['err_tamsd']['global']:.3f}")
STEP: 0.226
STEP + segments: 0.189
TA-MSD + segments: 0.249
STEP outperforms the TA-MSD baseline despite its disadvantage. It even reduces the error by \(\sim 25\%\) when we consider the pre-segmented trajectory! In the plot right below, we see the prediction error over segments at different lengths.
Code
= plt.figure(figsize=(1.5*fig_size, fig_size))
fig = ['STEP', 'STEP + segments', 'TA-MSD + segments']
labels = ['o', 's', 'D']
markers
for i, m in enumerate(metric_by_length.keys()):
= metric_by_length[m]['mean'], metric_by_length[m]['sem']
mean, sem =colors[i], label=labels[i],
plt.plot(lengths, mean, color=-i, lw=linewidth)
zorder- sem, mean + sem, color=colors[i],
plt.fill_between(lengths, mean =0.3, zorder=-i)
alpha
=alpha_grid)
plt.grid(alpha=11)
plt.legend(fontsize=14)
plt.tick_params(labelsize0.05, 0.75])
plt.ylim([10, 50, 100, 150, 190])
plt.xticks([0.2, 0.4, 0.6])
plt.yticks(["Segment length", fontsize=16)
plt.xlabel("Relative error", fontsize=16); plt.ylabel(
STEP systematically outperforms the TA-MSD baseline for short segments. Furthermore, when we provide it with the isolated segments, we can achieve outstanding low errors, specially for the shortest segments. This is reasonable because the shorter segments carry much less information to properly determine \(D\), and they can even be confused with simple stochastic fluctuations of the phenomena. This makes it harder to accurately determine the segment boundaries, as shown by the difference between the blue and red lines.
Changepoint detection
While our approach does not explicitly provide any changepoints, changes in diffusion properties emerge naturally in the predictions. Thus, we can use them to enhance dedicated changepoint detection algorithms. Here, we use the ruptures library, which implements a kernel changepoint detection algorithm (see baselines and their review paper).
We compare the results obtained applying ruptures over the trajectory displacements and over the STEP predictions. To quantify the performance, we compute the Jaccard index (JI) between the predicted changepoints and the ground truh. This metric can account for false positives and false negatives (see our model evaluation methods), which is very important in these cases as the changepoint detection algorithm will infer the number of changepoints, meaning that there may be a different number of predicted and ground truth changepoints.
The Jaccard index is a function of the true positives (TP), false positives (FP), and false negatives (FN): \[JI = \frac{TP}{TP + FP + FN}\,.\] Intuitively, it provides the percentage of correct predictions out of all of them.
Then, we also compute the mean squared error (MSE) between the predicted and their corresponding true changepoints.
Generate the data
In order to obtain a better insight about the strengths of the methods, we test them in trajectories with one single changepoint. In this way, we study how the changepoint position and difference between diffusion coefficient in consecutive segments affects the performance.
= np.logspace(-3, 3, 1000)
Ds = create_bm_segmentation_dataset(50000, dim=dim, Ds=Ds) ds
'y'] = ds['y_exp'].apply(torch.squeeze)
ds['y_log'] = ds['y'].apply(torch.log10)
ds['D_diff'] = ds['y_log'].apply(lambda x: (x[1] - x[0]).item())
ds['cp'] = ds['cp'].apply(lambda x: x.numpy())
ds[= ds.drop(columns=['y_mod', 'y_exp', 'models', 'exps']) ds
Get the predictions
Let’s get the predictions! We define a couple of functions to make our code more readable.
def make_batch(df, i, bs, col='x'):
"Return a batch of samples from df."
= [x.transpose(-1, 0) for x in ds.loc[i*bs:(i+1)*bs-1, col]]
samples return torch.stack(samples, dim=0)
def predict(model, x):
"Get prediction of `model` on batch `x`."
return to_detach(model(x.to(default_device())).squeeze()
Ready to go!
= 400
bs = np.ceil(ds.shape[0]/bs).astype(int)
n_batch for i in tqdm(range(n_batch)):
= make_batch(ds, i, bs)
xb = predict(learn_diff.model, xb)
pred *bs:(i+1)*bs-1, 'pred'] = np.array([p for p in pred],
ds.loc[i=object) dtype
Let’s save this for later.
= DATA_PATH/get_bmds_fname(1, max_t, dim, 'with_preds')
ds_path ds.to_pickle(ds_path)
Overall performance
Let’s compute the JI for the task using the STEP predictions and the trajectory displacements for the kernel changepoint prediction algorithm in ruptures.
To ease the code readability, we define a function to combine two dictionaries together (adhoc for our purposes).
def merge_dict(dict_1, dict_2):
"Merge the information of two dictionaries."
= {}
merge for k, v1 in dict_1.items():
= dict_2.get(k, [])
v2 if isinstance(v1, Iterable) and isinstance(v2, Iterable):
= [*v1, *v2]
merge[k] elif isinstance(v1, Iterable):
= [*v1, v2]
merge[k] elif isinstance(v2, Iterable):
= [v1, *v2]
merge[k] else:
= [v1, v2]
merge[k] return merge
Now we process every trajectory and retrieve the changepoints with ruptures using both the displacements and the STEP prediction. Then, we compute the JI between the prediction and the ground truth and, finally, the MSE for the true positives.
= {'pred': 2., 'displ': 6.}
method_pen = list(method_pen.keys())
methods
= {m: {} for m in methods}
metrics_method for i, row in tqdm(ds.iterrows()):
= row.x.numpy(), row.pred.numpy(), row.cp
traj, pred, true_cp = np.log(get_displacements(traj))
displacements
for m in methods:
= (pred if m == 'pred'
seg_data else displacements if m == 'displ'
else None)
= ruptures_cp(seg_data, pen=method_pen[m], min_size=5)
pred_cp = evaluate_cp_prediction(true_cp, pred_cp)
metrics = merge_dict(metrics, metrics_method[m])
metrics_method[m] for k, v in metrics.items():
if metrics['tp']:
f'cp_{m}_{k}'] = v[0] if k == 'sq_error' else v
ds.loc[i, else:
f'cp_{m}_{k}'] = -1 if k == 'sq_error' else v ds.loc[i,
Now we can compute the overall JI and MSE for both approaches over the whole test set.
= {}, {}
method_mse, method_j_idx for m, v in metrics_method.items():
= np.mean(v['sq_error'])
method_mse[m] = np.sum(v['tp']), np.sum(v['fp']), np.sum(v['fn'])
tp, fp, fn = jaccard_index(tp, fp, fn) method_j_idx[m]
method_j_idx, method_mse
({'pred': 0.8333494778851929, 'displ': 0.7963921392139214},
{'pred': 0.6803905614320586, 'displ': 0.6163644925829997})
We see that running rutpures on top of STEP increases the JI from 0.796 to 0.833, meaning that we reduce the errors by about \(20\%\). It is natural to expect that, by detecting more changepoints (higher JI), the method then incurs a larger MSE. This is because we consider a threshold of \(\mathcal{E}=5\) points from the ground truth to count the predictions as true positive. The results indicate that the additional changepoints detected with STEP lie, most likely, at the limit of the threshold.
Difference between consecutive segments
Now that we have an idea of the overall behaviour of both methods, let’s look at the JI as function of the relative difference between \(D\) of consecutive segments. Inuitively, the larger the difference, the easier it is to detect where the change is happening.
Beware that the difference in log space is the ratio in real space. Don’t get confused :)
= np.linspace(-6, 6, 100) # Difference in log space
bins_D_diff = {m: [] for m in methods}
j_idx_by_diff = {m: [] for m in methods}
mse_by_diff
for low, high in zip(bins_D_diff[:-1], bins_D_diff[1:]):
= (ds.D_diff >= low) & (ds.D_diff < high)
mask for m in methods:
= ds.loc[mask, f'cp_{m}_tp'].sum()
tp = ds.loc[mask, f'cp_{m}_fp'].sum()
fp = ds.loc[mask, f'cp_{m}_fn'].sum()
fn
j_idx_by_diff[m].append(jaccard_index(tp, fp, fn))
= mask & (ds.loc[:, f'cp_{m}_tp'] == 1)
mask_mse f'cp_{m}_sq_error'].mean()) mse_by_diff[m].append(ds.loc[mask_mse,
Code
= {'pred': colors[0], 'displ': colors_light[3]}
method_color = {'pred': 'KCPD + STEP', 'displ': 'KCPD + displ'}
method_label = {'pred': 'solid', 'displ': 'dashed'}
method_ls
= plt.figure(figsize=(1.5*fig_size, fig_size))
fig = bins_D_diff[:-1] + np.diff(bins_D_diff)
plt_x for m in methods:
=linewidth, label=method_label[m],
plt.plot(plt_x, j_idx_by_diff[m], linewidth=method_color[m], linestyle=method_ls[m])
color=alpha_grid)
plt.grid(alpha-0.05, 1.05])
plt.ylim([0., 0.5, 1.])
plt.yticks([-6, 7, 2), [f"$10^{{{t}}}$" for t in np.arange(-6, 7, 2)])
plt.xticks(np.arange(=14)
plt.legend(fontsizer"$D_2/D_1$", fontsize=16)
plt.xlabel("Jaccard index", fontsize=16)
plt.ylabel(=14) plt.tick_params(labelsize
We see that the changepoint method with STEP does, indeed, outperform the baseline for the whole range of \(D_2/D_1\). They converge to the same results in the limit of very similar diffusion coefficients \(D_2\approx D_1\), where the changes is barely detectable. Indeed, when \(D_2=D_1\) there is no change to detect!
Interestingly, we get a nearly perfect detection when the diffusion coefficients are just one order of magnitude appart.
Changepoint position
We can also study the impact of the changepoint position in the detection, which was shown to be a very important factor in the AnDi Challenge. Shorter segments are much harder to characterize and, by the same principle we saw the error increase for short segments, we expect the performance to drop when the change point is near the trajectory ends.
= np.arange(10, 200, 10)
bins_cp_pos = {m: [] for m in methods}
j_idx_by_position_D for low, high in zip(bins_cp_pos[:-1], bins_cp_pos[1:]):
= (low <= ds.cp) & (ds.cp < high)
mask for m in methods:
= ds.loc[mask, f'cp_{m}_tp'].sum()
tp = ds.loc[mask, f'cp_{m}_fp'].sum()
fp = ds.loc[mask, f'cp_{m}_fn'].sum()
fn j_idx_by_position_D[m].append(jaccard_index(tp, fp, fn))
Code
= plt.figure(figsize=(1.5*fig_size, fig_size))
fig for m in methods:
=linewidth, label=method_label[m],
plt.plot(j_idx_by_position_D[m], linewidth=method_color[m], linestyle=method_ls[m])
color= np.array([0, 9, 17])
x_tick_idx
plt.xticks(x_tick_idx, f'{low}-{high}' for low, high in zip(bins_cp_pos[x_tick_idx], bins_cp_pos[x_tick_idx+1])])
[=alpha_grid)
plt.grid(alpha-0.05, 1.05])
plt.ylim([0., 0.5, 1.])
plt.yticks(["Jaccard index", fontsize=16)
plt.ylabel("Changepoint position", fontsize=16)
plt.xlabel(=14) plt.tick_params(labelsize
Again, we clearly see how STEP outperforms the baseline with trajectory displacements in all circumstances. Furthermore, both methods show impressive resilience to the changepoint position, as their performance is barely affected by it.
Effect of the number of segments
An important factor to consider is the number of segments along trajectories. We have trained our models over trajectories with 2 to 5 segments, so it is very important to assess whether they can generalize to an arbitrary number of segments.
Here, we study the pointwise prediction error and the Jaccard index for the changepoint detection in trajectories with various segments of constant length.
Generate the data
We generate a data set comprised of trajectories with different number of segments at several constant segment lengths. This way, we can look at the impact of both the number of segments and their lengths separately.
We define our custom methods here just for this test, provided that we do not have adhoc functions to generate this kind of data. You can skip these cells if you already have a test set.
@delegates(brownian_motion)
def constant_segment_trajs(N, segment_length, n_change_points=1,
=np.logspace(-3, 3, 1000), **kwargs):
Ds"Create `N` trajectories with constant segment length."
= n_change_points + 1
n_segments = n_segments * segment_length
max_t = np.random.choice(Ds, size=(N, n_segments))
segment_D = np.repeat(segment_D, segment_length, axis=-1)[:, None, :]
point_D = brownian_motion(N, max_t, point_D, **kwargs)
bms return bms, point_D
def create_constant_segment_dataset(N, segment_length,
=1, dim=2):
n_change_points"Creates a dataset of `N` trajectories with constant segment length."
= []
data = constant_segment_trajs(N, segment_length,
trajs, labels =n_change_points,
n_change_points=dim)
dim= trajs.shape[-1]
length = np.arange(segment_length, length, segment_length)
cp for traj, label in zip(trajs, labels):
= tensor(traj), tensor(label).squeeze()
x, y 'dim': dim, 'len': length, 'seg_len': segment_length,
data.append({'n_cp': n_change_points, 'cp': cp,
'x': x, 'y': y, 'y_log': torch.log10(y)})
return pd.DataFrame.from_records(data)
With these functions, we can easily create our data set. We consider trajectories with none to 10 changepoints and segment lengths of 20, 40 and 60.
= 2000, [20, 40, 60], np.arange(11)
n_set, segment_lengths, cps = np.logspace(-3, 3, 1000)
Ds = [create_constant_segment_dataset(n_set, sl, n_change_points=n_cp)
datasets for sl in segment_lengths for n_cp in tqdm(cps)]
= combine_datasets(datasets, shuffle=False) ds
Get the predictions
With our dataset ready, we can proceed to perform the predictions. We store them in the same dataframe.
= 50
bs = np.ceil(ds.shape[0]/bs).astype(int)
n_batch for i in tqdm(range(n_batch)):
= make_batch(ds, i, bs).to(default_device())
xb = to_detach(learn_diff.model(xb)).squeeze()
pred *bs:(i+1)*bs-1, 'pred'] = np.array([p for p in pred], dtype=object) ds.loc[i
Pointwise error
So far, we have characterized the prediction error with various metrics, such as the mean absolute error or the relative error, which focus on the pointwise performance.
However, the pointwise relative error is a rather volatile quantity. For instance, in the case there is a large change between consecutive diffusion coefficients, e.g., from \(10^3\) to \(10^{-3}\), even if the agent performs a perfect characterization, but fails by a single point, the relative error for the whole trajectory shoots up to \(\sim10^6\). Therefore, in order to obtain a better understanding of the actual prediction error, \(D_{\text{true}} - D_{\text{pred}}\), we consider the mean prediction of each segment \(D_{\text{true}} - \langle D_{\text{pred}}\rangle_{\text{seg}}\).
def segment_rel_error(pred, y, cp):
"Segment-wise relative error. Assumes `pred` and `y` in log10."
= split_tensor(pred, cp)
segment_preds = torch.stack([p.mean() for p in segment_preds])
segment_means = torch.cat([y[0].unsqueeze(0), y[cp]])
true_values return (10**(segment_means - true_values) - 1).abs().mean()
# metrics
'mae'] = (ds['pred'] - ds['y_log']).abs().apply(torch.mean)
ds['seg_rel_error'] = ds.apply(lambda x: segment_rel_error(x['pred'],
ds['y_log'], x['cp']),
x[=1) axis
Save the data for posterior processing.
= DATA_PATH/"constant_segment_analysis.pkl"
ds_path ds.to_pickle(ds_path)
Finally, we compute the value of the relative erorr as a function of the number of change points.
def get_metric_sem_per_cp(metric, ds):
"Returns the metric per change point from dataset ds."
= ds.seg_len.unique().astype(int)
segment_lengths = ds.n_cp.unique().astype(int)
n_cps = np.zeros((len(segment_lengths), len(n_cps)))
metric_per_cp = np.zeros_like(metric_per_cp)
err_per_cp for k, length in enumerate(segment_lengths):
= (ds.seg_len == length) & (ds.n_cp == n_cp)
mask = np.array([getattr(ds, metric)[mask].mean()
metric_per_cp[k, :] for n_cp in n_cps])
= np.array([getattr(ds, metric)[mask].sem()
err_per_cp[k, :] for n_cp in n_cps])
return metric_per_cp, err_per_cp
= 'seg_rel_error'
metric = get_metric_sem_per_cp(metric, ds) rel_err_per_cp, rel_err_sem_per_cp
Code
= ['o', 's', 'D']
markers = [colors_light[0], colors[0], colors_dark[0]]
shades_blue = ds.n_cp.unique().astype(int) + 1
n_seg_rel_error
= plt.figure(figsize=(1.5*fig_size, fig_size))
fig for k, (val, err, length) in enumerate(zip(rel_err_per_cp, rel_err_sem_per_cp, segment_lengths)):
= markers[k], label=f"length {length}", c=shades_blue[k],
plt.plot(n_seg_rel_error, val, marker ='w', lw=linewidth)
markerfacecolor- err, val + err, alpha=0.3, color=colors[0])
plt.fill_between(n_seg_rel_error, val 2, 6, 10])
plt.xticks([0.16, 0.39])
plt.ylim([=14, loc='upper left')
plt.legend(fontsize"Number of segments", fontsize=16)
plt.xlabel("Relative error", fontsize=16)
plt.ylabel(=alpha_grid)
plt.grid(alpha=14) plt.tick_params(labelsize
We see that the error increases linearly with the number of segments. This means that every additional segment adds the same source of errors despite of how many there may already be.
Furthermore, we see that the segment length has a much larger impact than the number of segments. For instance, it is harder to characterize a trajectory with two segments of length 20 (40 time steps in total), than one with 11 segments of length 40 (440 time steps in total). However, in concordance with the first results in this tutorial, once we reach a certain minimum length, the error stabilizes.
Changepoint detection
Let’s see now the impact on the changepoint detection task. We will compute the JI as a function of the number of segments.
First of all, we filter out those trajectories without any changepoint.
= ds[ds.n_cp != 0].reset_index(drop=True) ds_cp
Now let’s calculate the true positives, false positives and false negatives of every prediction to obtain the JI (see above and our model evaluation methods).
= {'tp': [], 'fp': [], 'fn': []}
j_idx_parameters for _, (pred, cp) in ds_cp[['pred', 'cp']].iterrows():
= evaluate_cp_prediction(cp, ruptures_cp(pred.numpy(), pen=2.))
metrics for k, v in metrics.items():
if k in j_idx_parameters.keys():
j_idx_parameters[k].append(v)
for k, v in j_idx_parameters.items():
= v ds_cp[k]
Finally, we can compute the JI as function of the number of changepoints for every segment length.
= np.zeros_like(rel_err_per_cp)[:, :-1]
j_idx_per_cp for i, length in enumerate(ds_cp.seg_len.unique()):
for j, n_cp in enumerate(ds_cp.n_cp.unique().astype(int)):
= (ds_cp.seg_len == length) & (ds_cp.n_cp == n_cp)
mask = jaccard_index(ds_cp.tp[mask].sum(),
j_idx_per_cp[i, j] sum(),
ds_cp.fp[mask].sum()) ds_cp.fn[mask].
Code
= ds_cp.n_cp.unique().astype(int) + 1
n_seg for k, (val, length) in enumerate(zip(j_idx_per_cp, segment_lengths)):
= markers[k], label=f"Length {length}", c=shades_blue[k],
plt.plot(n_seg, val, marker ='w', lw=linewidth)
markerfacecolor# ax[1].fill_between(n_seg, val - err, val + err, alpha=0.3, color=colors[0])
= [2, 6, 10]
xticks
plt.xticks(xticks)0.7, 1.])
plt.ylim([0.7, 0.8, 0.9, 1.])
plt.yticks([=alpha_grid)
plt.grid(alpha=14)
plt.legend(fontsize"Jaccard Index", fontsize=16)
plt.ylabel("Number of segments", fontsize=16)
plt.xlabel(=14) plt.tick_params(labelsize
Again, we see that shorter segments are harder to characterize, and that segment length does not play a significant role beyond a certain threshold, as the curves for length 40 and 60 are very similar. In this case, we see that the trajectory length does, indeed, punish the performance more than it does for the pointwise error, as the model tends to accumulate missclasified changepoints along the way.
Resilience to noise
As a final test, we assess the resilience of STEP to localization noise. We simulate it as white nose added to the trajectories \(\sim\mathcal{N}(0,\sigma_{\text{noise}})\), just as we did in the model training tutorial.
This way, we use the first noiseless test set from the beginning of this tutorial, and we add noise at different \(\sigma_{\text{noise}}\).
(Optional) Load the data
Load the test set in case it is not in memory.
= load_dataset(n_change="1_to_4", dim=dim, bm=True, name="test") ds
Get the predictions
Let’s get the predictions over the noisy trajectories. First of all, however, we will pre-compute the logarithm of our targets and rename some columns.
'y'] = ds['y_exp'].apply(torch.squeeze)
ds['y_log'] = ds['y'].apply(torch.log10)
ds[# ds['cp'] = ds['cp'].apply(lambda x: x.numpy())
= ds.drop(columns=['y_mod', 'y_exp', 'models', 'exps']) ds
To make the most out of our trajectories, we will add 128 random levels of noise to each of them. We sample \(\sigma_{\text{noise}}\in[10^{-6}, 10^0]\) uniformly in log space.
= 128
noise_samples = 0, -6
noise_max, noise_min = noise_max - noise_min
noise_range = torch.rand(ds.shape[0], noise_samples)*noise_range + noise_min noise_traj
Just like we have done at the beginning, we can compare the segment-wise prediction of STEP with the TA-MSD baseline.
= (ds.shape[0], len(ds.x[0]), noise_traj.shape[1])
shape = torch.zeros(shape)
pred_noise = torch.zeros(shape)
pred_seg_noise = torch.zeros(shape)
pred_tamsd_noise
for i, (x, cps) in tqdm(ds[['x','cp']].iterrows()):
= torch.randn(noise_samples, *x.T.shape)*10**noise_traj[i]
noise = x.T.unsqueeze(0) + noise.unsqueeze(-1).unsqueeze(-1)
noisy_x -= noisy_x[:, 0].unsqueeze(1)
noisy_x = split_tensor(noisy_x.transpose(1, 0), cps)
split_x
= predict(learn_diff.model, noisy_x).T
pred_noise[i]
= [], []
pred_seg, pred_tamsd for seg_x in split_x:
= (seg_x - seg_x[0]).transpose(1, 0)
seg_x = torch.ones(seg_x.shape[1])
ones = to_detach(learn_diff.model(seg_x.cuda()).squeeze())
pred
pred_seg.append(pred)*diffusion_coefficient_tamsd(s)
pred_tamsd.append(torch.stack([onesfor s in seg_x]))
= torch.cat(pred_seg, axis=-1).T
pred_seg_noise[i] = torch.cat(pred_tamsd, axis=-1).T
pred_tamsd_noise[i]
= dict(zip(['full', 'seg', 'tamsd'],
predictions [pred_noise, pred_seg_noise, pred_tamsd_noise]))
With the predictions, let’s compute the error performed at every point as a function of the noise to signal ratio \(\sigma_{\text{noise}}/D\).
Since the diffusion coefficient changes along the trajectories, we need to compute \(\sigma_{\text{noise}}/D\) at every time step of all the trajectories.
= torch.stack([t for t in ds['y_log'].values])
y = noise_traj.unsqueeze(1) - y.unsqueeze(-1) rel_noise
Now we can compute the relative errors performed by every method at each time step.
= {k: p - y.unsqueeze(-1) if k != 'tamsd'
errors else p/(10**y.unsqueeze(-1)) - 1 # Already get rel error for tamsd
for k, p in predictions.items()}
Since we’re dealing with infinitely many combinations of \(\sigma_{\text{noise}}\) and \(D\), we compute the error as a function of small intervals of \(\sigma_{\text{noise}}/D\).
= torch.linspace(rel_noise.min(), rel_noise.max(), 100)
bins = [], [], [], []
noise_err, noise_err_std, x, counts
for low, high in tqdm(zip(bins[:-1], bins[1:])):
= (rel_noise >= low) & (rel_noise <= high)
mask + high)/2)
x.append((low float().sum())
counts.append(mask.= [(10**err[mask] - 1).abs() if k != 'tamsd'
rel_errors else err[mask].abs()
for k, err in errors.items()]
for rel_err in rel_errors]))
noise_err.append(tensor([rel_err.mean() for rel_err in rel_errors]))
noise_err_std.append(tensor([rel_err.std()
= torch.stack(noise_err)
noise_err = torch.stack(noise_err_std)
noise_err_std = torch.stack(x)
x = torch.stack(counts) counts
Let’s save these to process later.
= "noise_analysis"
fig_name = (FIG_PATH/fig_name).with_suffix(".pkl")
plot_path = (noise_err, noise_err_std, x, counts)
plot_data with open(plot_path, 'wb') as f:
=pickle.HIGHEST_PROTOCOL) pickle.dump(plot_data, f, protocol
And let’s see what we got!
Code
= noise_err_std/counts.sqrt().unsqueeze(-1)
sem
=(1.5*fig_size, fig_size))
plt.figure(figsizefor k, (err, s) in enumerate(zip(noise_err.T, sem.T)):
10**x, err, '-', linewidth=2., c=colors[k], label=labels[k])
plt.semilogx(10**x, err - s, err + s, alpha=0.5, color = colors[k])
plt.fill_between(=14)
plt.legend(fontsize= alpha_grid)
plt.grid(alpha 10**(-4.5), 30])
plt.xlim([0.15, 1.])
plt.ylim([=14)
plt.tick_params(labelsizer"$\sigma_{noise}/D$", fontsize=16)
plt.xlabel("Relative error", fontsize=16)
plt.ylabel(=14) plt.tick_params(labelsize
STEP is significantly more resilient to localization noise than the TA-MSD method. It only gets outperformed at noise levels that are greater than the actual signal, which are unrealistic experimental conditions.