Pascal VOC Datasets

This dataset is frequently used for training and evaluating semantic segmentation models, and supports tasks requiring dense, per-pixel annotations.

pascal_segmentation_dataset(
  root = tempdir(),
  year = "2012",
  split = "train",
  transform = NULL,
  target_transform = NULL,
  download = FALSE
)

pascal_detection_dataset(
  root = tempdir(),
  year = "2012",
  split = "train",
  transform = NULL,
  target_transform = NULL,
  download = FALSE
)

Arguments

root: Character. Root directory where the dataset will be stored under root/pascal_voc_<year>.
year: Character. VOC dataset version to use. One of "2007", "2008", "2009", "2010", "2011", or "2012". Default is "2012".
split: Character. One of "train", "val", "trainval", or "test". Determines the dataset split. Default is "train".
transform: Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping).
target_transform: Optional. A function that transforms the label.
download: Logical. If TRUE, downloads the dataset to root/. If the dataset is already present, download is skipped.

Value

A torch dataset of class pascal_segmentation_dataset.

The returned list inherits class image_with_segmentation_mask, which allows generic visualization utilities to be applied.

Each element is a named list with the following structure:

x: a H x W x 3 array representing the RGB image.
y: A named list containing:
- masks: A torch_tensor of dtype bool and shape (21, H, W), representing a multi-channel segmentation mask. Each of the 21 channels corresponds to a Pascal VOC classes
- labels: An integer vector indicating the indices of the classes present in the mask.

A torch dataset of class pascal_detection_dataset.

The returned list inherits class image_with_bounding_box, which allows generic visualization utilities to be applied.

Each element is a named list:

x: a H x W x 3 array representing the RGB image.
y: a list with:
- labels: a character vector with object class names.
- boxes: a tensor of shape (N, 4) with bounding box coordinates in (xmin, ymin, xmax, ymax) format.

Examples

if (FALSE) { # \dontrun{
# Load Pascal VOC segmentation dataset (2007 train split)
pascal_seg <- pascal_segmentation_dataset(
 transform = transform_to_tensor,
 download = TRUE,
 year = "2007"
)

# Access the first image and its mask
first_item <- pascal_seg[1]
first_item$x  # Image
first_item$y$masks  # Segmentation mask
first_item$y$labels  # Unique class labels in the mask
pascal_voc_classes(first_item$y$labels)  # Class names

# Visualise the first image and its mask
masked_img <- draw_segmentation_masks(first_item)
tensor_image_browse(masked_img)

# Load Pascal VOC detection dataset (2007 train split)
pascal_det <- pascal_detection_dataset(
 transform = transform_to_tensor,
 download = TRUE,
 year = "2007"
)

# Access the first image and its bounding boxes
first_item <- pascal_det[1]
first_item$x  # Image
first_item$y$labels  # Object labels
first_item$y$boxes  # Bounding box tensor

# Visualise the first image with bounding boxes
boxed_img <- draw_bounding_boxes(first_item)
tensor_image_browse(boxed_img)
} # }

Arguments

Value

See also

Examples