This dataset is frequently used for training and evaluating semantic segmentation models, and supports tasks requiring dense, per-pixel annotations.

pascal_segmentation_dataset(
  root = tempdir(),
  year = "2012",
  split = "train",
  transform = NULL,
  target_transform = NULL,
  download = FALSE
)

pascal_detection_dataset(
  root = tempdir(),
  year = "2012",
  split = "train",
  transform = NULL,
  target_transform = NULL,
  download = FALSE
)

Arguments

root

Character. Root directory where the dataset will be stored under root/pascal_voc_<year>.

year

Character. VOC dataset version to use. One of "2007", "2008", "2009", "2010", "2011", or "2012". Default is "2012".

split

Character. One of "train", "val", "trainval", or "test". Determines the dataset split. Default is "train".

transform

Optional. A function that takes an image and returns a transformed version (e.g., normalization, cropping).

target_transform

Optional. A function that transforms the label.

download

Logical. If TRUE, downloads the dataset to root/. If the dataset is already present, download is skipped.

Value

A torch dataset of class pascal_segmentation_dataset.

The returned list inherits class image_with_segmentation_mask, which allows generic visualization utilities to be applied.

Each element is a named list with the following structure:

  • x: a H x W x 3 array representing the RGB image.

  • y: A named list containing:

    • masks: A torch_tensor of dtype bool and shape (21, H, W), representing a multi-channel segmentation mask. Each of the 21 channels corresponds to a Pascal VOC classes

    • labels: An integer vector indicating the indices of the classes present in the mask.

A torch dataset of class pascal_detection_dataset.

The returned list inherits class image_with_bounding_box, which allows generic visualization utilities to be applied.

Each element is a named list:

  • x: a H x W x 3 array representing the RGB image.

  • y: a list with:

    • labels: a character vector with object class names.

    • boxes: a tensor of shape (N, 4) with bounding box coordinates in (xmin, ymin, xmax, ymax) format.

Examples

if (FALSE) { # \dontrun{
# Load Pascal VOC segmentation dataset (2007 train split)
pascal_seg <- pascal_segmentation_dataset(
 transform = transform_to_tensor,
 download = TRUE,
 year = "2007"
)

# Access the first image and its mask
first_item <- pascal_seg[1]
first_item$x  # Image
first_item$y$masks  # Segmentation mask
first_item$y$labels  # Unique class labels in the mask
pascal_voc_classes(first_item$y$labels)  # Class names

# Visualise the first image and its mask
masked_img <- draw_segmentation_masks(first_item)
tensor_image_browse(masked_img)

# Load Pascal VOC detection dataset (2007 train split)
pascal_det <- pascal_detection_dataset(
 transform = transform_to_tensor,
 download = TRUE,
 year = "2007"
)

# Access the first image and its bounding boxes
first_item <- pascal_det[1]
first_item$x  # Image
first_item$y$labels  # Object labels
first_item$y$boxes  # Bounding box tensor

# Visualise the first image with bounding boxes
boxed_img <- draw_bounding_boxes(first_item)
tensor_image_browse(boxed_img)
} # }