Object detection models combining a ConvNeXt backbone with a Feature Pyramid Network (FPN) and the Faster R-CNN detection head. The architecture mirrors model_fasterrcnn_resnet50_fpn(), with the ResNet backbone replaced by ConvNeXt variants. The design follows the paper A ConvNet for the 2020s.

Available Models

  • model_convnext_tiny_detection()

  • model_convnext_small_detection()

  • model_convnext_base_detection()

Backbone Performance (ImageNet-1k)

Accuracy metrics reflect backbone classification performance only. Detection head weights are randomly initialized and must be fine-tuned on task-specific labelled data before meaningful predictions are produced.

| Model                             | Top-1 Acc | Top-5 Acc | Params  | GFLOPS | File Size | Backbone Weights              | Notes                    |
|-----------------------------------|-----------|-----------|---------|--------|-----------|-------------------------------|--------------------------|
| model_convnext_tiny_detection     | 82.5%     | 96.1%     | 28.6M   | 4.46   | 109 MB    | IMAGENET1K_V1                 | Tiny backbone, FPN head  |
| model_convnext_small_detection    | 83.6%     | 96.7%     | 50.2M   | 8.68   | 192 MB    | IMAGENET1K_V1 (22k pretrain)  | Small backbone, FPN head |
| model_convnext_base_detection     | 84.1%     | 96.9%     | 88.6M   | 15.36  | 338 MB    | IMAGENET1K_V1                 | Base backbone, FPN head  |

FPN Channel Configuration

Each ConvNeXt variant produces four feature maps (C2–C5) fed into the FPN. Channel widths differ between Tiny/Small and Base:

| Variant | FPN in_channels          | FPN out_channels |
|---------|--------------------------|------------------|
| Tiny    | c(96, 192, 384, 768)     | 256              |
| Small   | c(96, 192, 384, 768)     | 256              |
| Base    | c(128, 256, 512, 1024)   | 256              |

Weights Selection

  • All variants use IMAGENET1K_V1 backbone weights by default (supervised ImageNet-1k).

  • The Small variant backbone (model_convnext_small_22k) was additionally pretrained on ImageNet-22k prior to fine-tuning on ImageNet-1k.

  • Detection head weights are randomly initialized — bounding-box predictions are meaningless without fine-tuning on labelled detection data.

  • Set pretrained_backbone = TRUE to load ImageNet backbone weights.

model_convnext_tiny_detection(
  num_classes = 91,
  pretrained_backbone = FALSE,
  ...
)

model_convnext_small_detection(
  num_classes = 91,
  pretrained_backbone = FALSE,
  ...
)

model_convnext_base_detection(
  num_classes = 91,
  pretrained_backbone = FALSE,
  ...
)

Arguments

num_classes

Number of output classes excluding background (default: 90 for COCO).

pretrained_backbone

Logical. If TRUE, loads ImageNet-pretrained ConvNeXt backbone weights. Default: FALSE.

...

Other arguments (unused).

Functions

  • model_convnext_tiny_detection(): ConvNeXt Tiny with FPN detection head

  • model_convnext_small_detection(): ConvNeXt Small with FPN detection head

  • model_convnext_base_detection(): ConvNeXt Base with FPN detection head

Note

Detection head weights are randomly initialized. Predicted bounding boxes will be arbitrary until the detection head is trained on labelled data. Only the backbone benefits from pretrained_backbone = TRUE.

See also

Other object_detection_model: model_facenet, model_fasterrcnn, model_maskrcnn

Examples

if (FALSE) { # \dontrun{
library(magrittr)
norm_mean <- c(0.485, 0.456, 0.406) # ImageNet normalization constants
norm_std  <- c(0.229, 0.224, 0.225)

url <- paste0("https://upload.wikimedia.org/wikipedia/commons/thumb/",
              "e/ea/Morsan_Normande_vache.jpg/120px-Morsan_Normande_vache.jpg")
img <- base_loader(url) %>%
  transform_to_tensor() %>%
  transform_resize(c(520, 520))

input <- img %>% transform_normalize(norm_mean, norm_std)
batch <- input$unsqueeze(1)    # Add batch dimension: (1, 3, H, W)

# ConvNeXt Tiny detection
model <- model_convnext_tiny_detection(pretrained_backbone = TRUE)
model$eval()
# Please wait 2 mins + on CPU
pred     <- model(batch)$detections[[1]]
num_boxes <- as.integer(pred$boxes$size()[1])
topk     <- pred$scores$topk(k = 5)[[2]]
boxes    <- pred$boxes[topk, ]
labels   <- imagenet_classes(as.integer(pred$labels[topk]))

# `draw_bounding_box()` may fail if bbox values are not consistent.
if (num_boxes > 0) {
  boxed <- draw_bounding_boxes(img, boxes, labels = labels)
  tensor_image_browse(boxed)
}
} # }