Functions & Loaders
Functions to load multiple datasets at-once.
- bgd.loaders.get_all_datasets(transforms, num=5000, mol_only=False)[source]
Get all datasets for training and validation, in that order.
- Parameters:
transforms (list) – List of data transformations to apply to the datasets.
num (int, optional) – Number of samples to load from each dataset. Defaults to 5000.
mol_only (bool, optional) – Flag indicating whether to include only chemical datasets. Defaults to False.
- Returns:
- A tuple containing two elements:
datasets (list): A list of all the datasets.
all_names (list): A list of names corresponding to each dataset.
- Return type:
tuple
- bgd.loaders.get_datasets(transforms, num, stage='train', exclude=None, include=None)[source]
Retrieves and transforms a list of datasets based on specified inclusion and exclusion criteria.
- Parameters:
transforms (function) – A function to apply transformations to each dataset.
num (int) – The number of data points to include in each dataset.
stage (str, optional) – The stage of data processing (e.g., “train”, “test”, “validate”). Default is “train”.
exclude (list or str, optional) – A list or a single string specifying dataset names to exclude from the selection. If None, no datasets will be excluded. Default is None.
include (list or str, optional) – A list or a single string specifying dataset names to include in the selection. If None, all datasets not in the exclude list will be included. Default is None.
- Returns:
A list of transformed datasets. list: A list of names of the selected datasets.
- Return type:
list
Notes
If both exclude and include are provided, the function first applies the exclude filter and then the include filter.
The function checks for the existence of a “bgd_files” directory and creates it if it does not exist.
The function supports various datasets, including predefined datasets and those from the Open Graph Benchmark (OGB) and TU datasets.
Example
>>> def dummy_transform(dataset): >>> return dataset >>> datasets, names = get_datasets(dummy_transform, num=100, stage="train", exclude=["reddit"], include=["cora", "trees"]) >>> print(names) ['cora', 'trees']
- bgd.loaders.get_edge_task_datasets(transforms, num=5000, stage='train')[source]
Returns datasets with edge-level tasks, both regression and classification.
- Parameters:
transforms (list) – List of data transformations to apply to the datasets.
num (int, optional) – Number of datasets to retrieve. Defaults to 5000.
stage (str, optional) – Stage of the datasets to retrieve. Defaults to “train”.
- Returns:
List of edge task datasets (:obj:torch_geometric.data.InMemoryDataset).
- Return type:
list
- bgd.loaders.get_graph_classification_datasets(transforms, num=5000, stage='train')[source]
Returns datasets with graph classification tasks.
- Parameters:
transforms (list) – List of data transformations to apply to the datasets.
num (int, optional) – Number of datasets to retrieve. Defaults to 5000.
stage (str, optional) – Stage of the datasets to retrieve. Defaults to “train”.
- Returns:
List of graph classification datasets (:obj:torch_geometric.data.InMemoryDataset).
- Return type:
list
- bgd.loaders.get_graph_regression_datasets(transforms, num=5000, stage='train')[source]
Returns datasets with graph regression tasks.
- Parameters:
transforms (list) – List of data transformations to apply to the datasets.
num (int, optional) – Number of datasets to retrieve. Defaults to 5000.
stage (str, optional) – Stage of the datasets to retrieve. Defaults to “train”.
- Returns:
List of graph regression datasets (:obj:torch_geometric.data.InMemoryDataset).
- Return type:
list
- bgd.loaders.get_graph_task_datasets(transforms, num=5000, stage='train')[source]
Returns datasets with graph-level tasks, both regression and classification.
- Parameters:
transforms (list) – List of data transformations to apply to the datasets.
num (int, optional) – Number of datasets to retrieve. Defaults to 5000.
stage (str, optional) – Stage of the datasets to retrieve. Defaults to “train”.
- Returns:
List of graph task datasets (:obj:torch_geometric.data.InMemoryDataset).
- Return type:
list
- bgd.loaders.get_node_task_datasets(transforms, num=5000, stage='train')[source]
Returns datasets with node-level tasks, both classification and regression.
- Parameters:
transforms (list) – List of data transformations to apply to the datasets.
num (int, optional) – Number of datasets to retrieve. Defaults to 5000.
stage (str, optional) – Stage of the datasets to retrieve. Defaults to “train”.
- Returns:
List of node task datasets (:obj:torch_geometric.data.InMemoryDataset).
- Return type:
list
- bgd.loaders.get_test_datasets(transforms, num=2000, mol_only=False, exclude=['community', 'trees', 'random'], include=None)[source]
Get the test split of each dataset.
- Parameters:
transforms (list) – List of data transformations to apply.
num (int) – Number of samples in datasets to include (default is 2000).
mol_only (bool) – Flag indicating whether to include only chemical datasets (default is False).
- Returns:
- A tuple containing two elements:
datasets (list): List of test datasets.
names (list): List of dataset names.
- Return type:
tuple
- bgd.loaders.get_train_datasets(transforms, num=2000, mol_only=False, exclude=['ogbg-molpcba'], include=None)[source]
Get the training splits of each dataset.
- Parameters:
transforms (list) – List of data transformations to apply.
num (int) – Number of datasets to retrieve.
mol_only (bool) – Flag indicating whether to retrieve only chemical datasets.
- Returns:
- A tuple containing two elements:
datasets (list): A list of all the datasets.
all_names (list): A list of names corresponding to each dataset.
- Return type:
tuple
- bgd.loaders.get_val_datasets(transforms, num=2000, mol_only=False, exclude=['community', 'trees', 'random'], include=None)[source]
Get validation splits for each dataset.
- Parameters:
transforms (list) – List of data transformations to apply.
num (int, optional) – Number of samples in datasets to include. Defaults to 2000.
mol_only (bool, optional) – Flag indicating whether to include only chemical datasets. Defaults to False.
- Returns:
- A tuple containing two elements:
datasets (list): List of validation datasets.
names (list): List of dataset names.
- Return type:
tuple