A comprehensive catalog of all collections RF100 (RoboFlow 100) and EMNIST datasets available in torchvision. This data frame contains metadata about each dataset including descriptions, sizes, available splits, and collection information.

collection_catalog

Format

A data frame with datasets as rows and 17 columns:

collection

Collection name (biology, medical, infrared, damage, underwater, document, mnist)

dataset

Dataset identifier used in collection functions

description

Brief description of the dataset and its purpose

task

Machine learning task type (currently all "object_detection")

num_classes

Number of different object classes

num_images

Total images across all splits

image_width

Typical image width in pixels

image_height

Typical image height in pixels

train_size_mb

Size of training split in megabytes

test_size_mb

Size of test split in megabytes

valid_size_mb

Size of validation split in megabytes

total_size_mb

Total size across all splits in megabytes

has_train

Is training split available

has_test

Is test split available

has_valid

Is validation split available

function_name

R function name to load this dataset's collection

roboflow_url

URL to the collection on RoboFlow Universe

Examples

if (FALSE) { # \dontrun{
# View the complete catalog
data(collection_catalog)
View(collection_catalog)

# See all biology datasets
subset(collection_catalog, collection == "biology")

# Find large datasets (> 100 MB)
subset(collection_catalog, total_size_mb > 100)
} # }