Data

Data generation and data loading methods for anomalous diffusion segmentation.

Data info

We provide a set of variables with default information about the nature of the data and the default path to save it.

ROOT = Path(os.path.dirname(os.path.abspath('../')))
DATA_PATH = ROOT/"data/"
DATA_PATH.mkdir(exist_ok=True, parents=True)
FIG_PATH = DATA_PATH/"figures"
FIG_PATH.mkdir(exist_ok=True, parents=True)
MODEL_PATH = ROOT/"models"
MODEL_PATH.mkdir(exist_ok=True, parents=True)

MODEL_DATA = {
    0: {'name': 'attm', 'exps': (0.05, 1.)},
    1: {'name': 'ctrw', 'exps': (0.05, 1.)},
    2: {'name': 'fbm',  'exps': (0.05, 1.95)},
    3: {'name': 'lw',   'exps': (1.05, 2.)},
    4: {'name': 'sbm',  'exps': (0.05, 2.)}
    }
              
DEFAULT_TOKEN = -1

By default, we keep the root of the repository in ROOT, from which we define DATA_PATH=ROOT/'data' and MODEL_PATH=ROOT/'models', subsequently, we define FIG_PATH=DATA_PATH/'figures'. We use these as default paths to save and load the data, the trained models and the output figures.

Then, MODEL_DATA is a dictionary containing the information about the different anomalous diffusion models that we consider. And DEFAULT_TOKEN=-1 is the default value for the beginning of sequence token.

Data generation

To train our models, we make extensive use of simulated trajectories. The goal is to simulate realistic conditions in experiments to help our models generalize well in experimental applications.

We follow two main approaches to simulate our data depending on whether we’re working with anomalous diffusion or Brownian motion (normal diffusion).

However, the main framework is common for both. To generate trajectories with changes in diffusive behaviour, we simulate as many full trajectories as different segments we wish to have. Then, we take a sample segment of each and combine them together to obtain the resulting heterogeneous trajectory.

	Type	Default	Details
n_traj	int		Number of trajectories
max_t	int	200	Maximum trajectory length
dim	int	1	Trajectory dimension
n_change_points	int	1	Number of changepoints in the trajectories
models	list	[0, 1, 2, 3, 4]	Diffusion models to consider
exponents	numpy.ndarray \| None	None	Anomalous exponents to consider. Defaults to full range
noise	list	[0.1, 0.5, 1.0]	Noise standard deviation
path	pathlib.Path \| str \| None	None	Path to save the data
save	bool	True	Save or not the data
name	str		Optional name for the data set
margin	int	10
random_lengths	bool	False
Returns	DataFrame

	Type	Default	Details
n_traj	int		Number of trajectories
max_t	int	200	Maximum trajectory length
dim	int	1	Trajectory dimension
n_change_points	int	1	Number of changepoints in the trajectories
Ds	collections.abc.Iterable \| None	None	Diffusion coefficients to consider defaults to logspace(-3, 3)
path	pathlib.Path \| str \| None	None	Path to save the data
save	bool	True	Save or not the data
name	str		Optional name for the data set
margin	int	10
random_lengths	bool	False
Returns	DataFrame

	dim	len	x	y
0	2	200	[[tensor(0.), tensor(-0.5156), tensor(-0.8383), tensor(-1.4615), tensor(-1.7552), tensor(-1.6362), tensor(-1.4192), tensor(-1.3548), tensor(-1.2144), tensor(-1.7406), tensor(-1.6199), tensor(-1.4474), tensor(-0.2902), tensor(0.5447), tensor(0.5666), tensor(0.1618), tensor(-0.1004), tensor(-0.1377), tensor(-1.1998), tensor(-1.9200), tensor(-3.1992), tensor(-4.4678), tensor(-5.7045), tensor(-6.3007), tensor(-6.6895), tensor(-7.5415), tensor(-7.9907), tensor(-9.3120), tensor(-10.1935), tensor(-10.5246), tensor(-11.5122), tensor(-12.5124), tensor(-13.6076), tensor(-13.9701), tensor(-14.3846), ...	[]
1	2	200	[[tensor(0.), tensor(1.8434), tensor(2.5901), tensor(3.3360), tensor(3.3675), tensor(5.1754), tensor(6.7046), tensor(6.5266), tensor(5.5823), tensor(4.2652), tensor(2.6675), tensor(2.0960), tensor(1.3606), tensor(-0.0349), tensor(-1.4965), tensor(-0.4068), tensor(-1.8796), tensor(-1.7324), tensor(-1.8430), tensor(-0.4918), tensor(0.8988), tensor(1.1704), tensor(2.8995), tensor(3.4126), tensor(3.3071), tensor(4.3777), tensor(4.9901), tensor(6.3406), tensor(7.1728), tensor(8.1412), tensor(9.1267), tensor(9.0713), tensor(9.1243), tensor(11.2437), tensor(12.1926), tensor(12.3969), tensor(12.98...	[]

	Type	Default	Details
target	str	y_mod	Task target `y_mod`, `y_exp` or `y` for both.
models	list \| None	None	List of models to consider. Defaults to all.
exps	collections.abc.Iterable \| None	None	List of anomalous exponents to consider. Deafults to all.
size	int \| None	None	Maximum data set size. Defaults to full data set.
bs	int	128	Batch size.
split_pct	float	0.2	Validation set split percentage from training data.
shuffle	bool	True	Shuffle the dataset.
tfm_y	collections.abc.Callable \| None	None	Transformation to apply to the target, e.g., `torch.log10`.
n_change	int \| str	1	Number of changes in the trajectories, e.g., ‘1_to_4’.
bm	bool	False	Is it Brownian motion (False for anomalous diffusion)
max_t	int	200
dim	int	1
name	str
path	NoneType	None
Returns	DataLoaders

Data info

Data generation

combine_trajectories

trajs2df

Anomalous diffusion segmentation dataset

add_localization_noise

create_andi_trajectories

get_andids_fname

create_andi_segmentation_dataset

Brownian motion

brownian_motion

create_bm_trajectories

get_bmds_fname

create_bm_segmentation_dataset

ATTM trajectories

create_fixed_attm_trajs

Datasets with variable number of change points

combine_datasets

Example: Brownian motion

Example: anomalous diffusion

Validation with AnDi

load_andi_data

DataLoaders

load_dataset

get_segmentation_dls

SegmentationTransform

get_transformer_dls

get_andi_valid_dls