Semantic segmentation models implementing the DeepLabV3 architecture from Rethinking Atrous Convolution for Semantic Image Segmentation. These models use Atrous Spatial Pyramid Pooling (ASPP) to capture multi-scale context, and are available with ResNet-50 and ResNet-101 backbones.

Available Models

  • model_deeplabv3_resnet50()

  • model_deeplabv3_resnet101()

Model Variants and Performance (COCO val2017, VOC labels)

All models are trained on a 20-class subset of COCO that corresponds to Pascal VOC categories, plus background (21 classes total).

| Model                     | mIoU  | Pixel Acc | Params | GFLOPS | File Size | Weights Used              |
|---------------------------|-------|-----------|--------|--------|-----------|---------------------------|
| model_deeplabv3_resnet50  | 66.4% | 92.4%     | 42.0M  | 178.72 | 161 MB    | COCO_WITH_VOC_LABELS_V1   |
| model_deeplabv3_resnet101 | 67.4% | 92.4%     | 61.0M  | 258.74 | 233 MB    | COCO_WITH_VOC_LABELS_V1   |

Weights Selection

  • All models use COCO_WITH_VOC_LABELS_V1 weights, trained on COCO with the 20 Pascal VOC categories (+ background = 21 classes).

  • Backbone weights default to IMAGENET1K_V1 (supervised ImageNet-1k) when pretrained = FALSE and pretrained_backbone = TRUE.

  • When pretrained = TRUE, backbone weights are overridden by the full segmentation model weights and pretrained_backbone is ignored.

  • The auxiliary classifier branch (aux_loss) is automatically enabled when loading pretrained weights; set explicitly when training from scratch.

Input Format

Models expect input tensors of shape (batch_size, 3, H, W), normalized with ImageNet mean c(0.485, 0.456, 0.406) and std c(0.229, 0.224, 0.225). Training resolution is 520x520.

Output Format

Returns a named list with:

  • $out — main segmentation logits, shape (batch, num_classes, H, W)

  • $aux — auxiliary logits from an intermediate backbone layer (only when aux_loss = TRUE)

model_deeplabv3_resnet50(
  pretrained = FALSE,
  progress = TRUE,
  num_classes = 21,
  aux_loss = NULL,
  pretrained_backbone = FALSE,
  ...
)

model_deeplabv3_resnet101(
  pretrained = FALSE,
  progress = TRUE,
  num_classes = 21,
  aux_loss = NULL,
  pretrained_backbone = FALSE,
  ...
)

Arguments

pretrained

(bool): If TRUE, returns a model pre-trained on ImageNet.

progress

(bool): If TRUE, displays a progress bar of the download to stderr.

num_classes

Integer. Number of output segmentation classes including background. Default: 21 (Pascal VOC). Set to NULL to infer from pretrained weights.

aux_loss

Logical or NULL. If TRUE, adds an auxiliary FCN classifier head at an intermediate backbone layer, used as a secondary loss during training. If NULL (default), inferred from pretrained weights.

pretrained_backbone

Logical. If TRUE and pretrained = FALSE, loads IMAGENET1K_V1 weights for the backbone only. Ignored when pretrained = TRUE. Default: TRUE.

...

Other parameters passed to the resnet model.

Functions

  • model_deeplabv3_resnet50(): DeepLabV3 with ResNet-50 backbone

  • model_deeplabv3_resnet101(): DeepLabV3 with ResNet-101 backbone

See also

Other semantic_segmentation_model: model_convnext_segmentation, model_fcn_resnet

Examples

if (FALSE) { # \dontrun{
library(magrittr)
norm_mean <- c(0.485, 0.456, 0.406)
norm_std  <- c(0.229, 0.224, 0.225)

url <- paste0("https://upload.wikimedia.org/wikipedia/commons/thumb/",
           "e/ea/Morsan_Normande_vache.jpg/120px-Morsan_Normande_vache.jpg")
img <- base_loader(url)

input <- img %>%
  transform_to_tensor() %>%
  transform_resize(c(520, 520)) %>%
  transform_normalize(norm_mean, norm_std)
batch <- input$unsqueeze(1)    # Add batch dimension: (1, 3, H, W)

# --- ResNet-50 backbone ---
model <- model_deeplabv3_resnet50(pretrained = TRUE)
model$eval()
output <- model(batch)

segmented <- draw_segmentation_masks(input, output$out$squeeze(1))
tensor_image_browse(segmented)

# Show most frequent class
mask_id <- output$out$argmax(dim = 2)  # (1, H, W)
class_contingency_with_background <- mask_id$view(-1)$bincount()
class_contingency_with_background[1] <- 0L # we clean the counter for background class id 1
top_class_index <- class_contingency_with_background$argmax()$item()
cli::cli_inform("Majority class {.pkg ResNet-50}: {.emph {pascal_voc_classes(top_class_index)}}")

# --- ResNet-101 backbone ---
model <- model_deeplabv3_resnet101(pretrained = TRUE)
model$eval()
output <- model(batch)

segmented <- draw_segmentation_masks(input, output$out$squeeze(1))
tensor_image_browse(segmented)

# Show most frequent class
mask_id <- output$out$argmax(dim = 2)  # (1, H, W)
class_contingency_with_background <- mask_id$view(-1)$bincount()
class_contingency_with_background[1] <- 0L # we clean the counter for background class id 1
top_class_index <- class_contingency_with_background$argmax()$item()
cli::cli_inform("Majority class {.pkg ResNet-50}: {.emph {pascal_voc_classes(top_class_index)}}")
} # }