Functions & Loaders

Functions to load multiple datasets at-once.

bgd.loaders.get_all_datasets(transforms, num=5000, mol_only=False)[source]

Get all datasets for training and validation, in that order.

Parameters:

transforms (list) – List of data transformations to apply to the datasets.
num (int, optional) – Number of samples to load from each dataset. Defaults to 5000.
mol_only (bool, optional) – Flag indicating whether to include only chemical datasets. Defaults to False.

Returns:

A tuple containing two elements:

datasets (list): A list of all the datasets.
all_names (list): A list of names corresponding to each dataset.

Return type:

tuple

bgd.loaders.get_datasets(transforms, num, stage='train', exclude=None, include=None)[source]

Retrieves and transforms a list of datasets based on specified inclusion and exclusion criteria.

Parameters:

transforms (function) – A function to apply transformations to each dataset.
num (int) – The number of data points to include in each dataset.
stage (str, optional) – The stage of data processing (e.g., “train”, “test”, “validate”). Default is “train”.
exclude (list or str, optional) – A list or a single string specifying dataset names to exclude from the selection. If None, no datasets will be excluded. Default is None.
include (list or str, optional) – A list or a single string specifying dataset names to include in the selection. If None, all datasets not in the exclude list will be included. Default is None.

Returns:

A list of transformed datasets. list: A list of names of the selected datasets.

Return type:

list

Notes

If both exclude and include are provided, the function first applies the exclude filter and then the include filter.
The function checks for the existence of a “bgd_files” directory and creates it if it does not exist.
The function supports various datasets, including predefined datasets and those from the Open Graph Benchmark (OGB) and TU datasets.

Example

>>> def dummy_transform(dataset):
>>>     return dataset
>>> datasets, names = get_datasets(dummy_transform, num=100, stage="train", exclude=["reddit"], include=["cora", "trees"])
>>> print(names)
['cora', 'trees']

bgd.loaders.get_edge_task_datasets(transforms, num=5000, stage='train')[source]

Returns datasets with edge-level tasks, both regression and classification.

Parameters:

transforms (list) – List of data transformations to apply to the datasets.
num (int, optional) – Number of datasets to retrieve. Defaults to 5000.
stage (str, optional) – Stage of the datasets to retrieve. Defaults to “train”.

Returns:

List of edge task datasets (:obj:torch_geometric.data.InMemoryDataset).

Return type:

list

bgd.loaders.get_graph_classification_datasets(transforms, num=5000, stage='train')[source]

Returns datasets with graph classification tasks.

Parameters:

transforms (list) – List of data transformations to apply to the datasets.
num (int, optional) – Number of datasets to retrieve. Defaults to 5000.
stage (str, optional) – Stage of the datasets to retrieve. Defaults to “train”.

Returns:

List of graph classification datasets (:obj:torch_geometric.data.InMemoryDataset).

Return type:

list

bgd.loaders.get_graph_regression_datasets(transforms, num=5000, stage='train')[source]

Returns datasets with graph regression tasks.

Parameters:

transforms (list) – List of data transformations to apply to the datasets.
num (int, optional) – Number of datasets to retrieve. Defaults to 5000.
stage (str, optional) – Stage of the datasets to retrieve. Defaults to “train”.

Returns:

List of graph regression datasets (:obj:torch_geometric.data.InMemoryDataset).

Return type:

list

bgd.loaders.get_graph_task_datasets(transforms, num=5000, stage='train')[source]

Returns datasets with graph-level tasks, both regression and classification.

Parameters:

transforms (list) – List of data transformations to apply to the datasets.
num (int, optional) – Number of datasets to retrieve. Defaults to 5000.
stage (str, optional) – Stage of the datasets to retrieve. Defaults to “train”.

Returns:

List of graph task datasets (:obj:torch_geometric.data.InMemoryDataset).

Return type:

list

bgd.loaders.get_node_task_datasets(transforms, num=5000, stage='train')[source]

Returns datasets with node-level tasks, both classification and regression.

Parameters:

transforms (list) – List of data transformations to apply to the datasets.
num (int, optional) – Number of datasets to retrieve. Defaults to 5000.
stage (str, optional) – Stage of the datasets to retrieve. Defaults to “train”.

Returns:

List of node task datasets (:obj:torch_geometric.data.InMemoryDataset).

Return type:

list

bgd.loaders.get_test_datasets(transforms, num=2000, mol_only=False, exclude=['community', 'trees', 'random'], include=None)[source]

Get the test split of each dataset.

Parameters:

transforms (list) – List of data transformations to apply.
num (int) – Number of samples in datasets to include (default is 2000).
mol_only (bool) – Flag indicating whether to include only chemical datasets (default is False).

Returns:

A tuple containing two elements:

datasets (list): List of test datasets.
names (list): List of dataset names.

Return type:

tuple

bgd.loaders.get_train_datasets(transforms, num=2000, mol_only=False, exclude=['ogbg-molpcba'], include=None)[source]

Get the training splits of each dataset.

Parameters:

transforms (list) – List of data transformations to apply.
num (int) – Number of datasets to retrieve.
mol_only (bool) – Flag indicating whether to retrieve only chemical datasets.

Returns:

A tuple containing two elements:

datasets (list): A list of all the datasets.
all_names (list): A list of names corresponding to each dataset.

Return type:

tuple

bgd.loaders.get_val_datasets(transforms, num=2000, mol_only=False, exclude=['community', 'trees', 'random'], include=None)[source]

Get validation splits for each dataset.

Parameters:

transforms (list) – List of data transformations to apply.
num (int, optional) – Number of samples in datasets to include. Defaults to 2000.
mol_only (bool, optional) – Flag indicating whether to include only chemical datasets. Defaults to False.

Returns:

A tuple containing two elements:

datasets (list): List of validation datasets.
names (list): List of dataset names.

Return type:

tuple