R/models-convnext_detection.R
model_convnext_detection.RdObject detection models combining a ConvNeXt backbone with a Feature Pyramid
Network (FPN) and the Faster R-CNN detection head. The architecture mirrors
model_fasterrcnn_resnet50_fpn(), with the ResNet backbone replaced by
ConvNeXt variants. The design follows the paper
A ConvNet for the 2020s.
model_convnext_tiny_detection()
model_convnext_small_detection()
model_convnext_base_detection()
Accuracy metrics reflect backbone classification performance only. Detection head weights are randomly initialized and must be fine-tuned on task-specific labelled data before meaningful predictions are produced.
| Model | Top-1 Acc | Top-5 Acc | Params | GFLOPS | File Size | Backbone Weights | Notes |
|-----------------------------------|-----------|-----------|---------|--------|-----------|-------------------------------|--------------------------|
| model_convnext_tiny_detection | 82.5% | 96.1% | 28.6M | 4.46 | 109 MB | IMAGENET1K_V1 | Tiny backbone, FPN head |
| model_convnext_small_detection | 83.6% | 96.7% | 50.2M | 8.68 | 192 MB | IMAGENET1K_V1 (22k pretrain) | Small backbone, FPN head |
| model_convnext_base_detection | 84.1% | 96.9% | 88.6M | 15.36 | 338 MB | IMAGENET1K_V1 | Base backbone, FPN head |Each ConvNeXt variant produces four feature maps (C2–C5) fed into the FPN. Channel widths differ between Tiny/Small and Base:
All variants use IMAGENET1K_V1 backbone weights by default (supervised ImageNet-1k).
The Small variant backbone (model_convnext_small_22k) was additionally
pretrained on ImageNet-22k prior to fine-tuning on ImageNet-1k.
Detection head weights are randomly initialized — bounding-box predictions are meaningless without fine-tuning on labelled detection data.
Set pretrained_backbone = TRUE to load ImageNet backbone weights.
model_convnext_tiny_detection(
num_classes = 91,
pretrained_backbone = FALSE,
...
)
model_convnext_small_detection(
num_classes = 91,
pretrained_backbone = FALSE,
...
)
model_convnext_base_detection(
num_classes = 91,
pretrained_backbone = FALSE,
...
)model_convnext_tiny_detection(): ConvNeXt Tiny with FPN detection head
model_convnext_small_detection(): ConvNeXt Small with FPN detection head
model_convnext_base_detection(): ConvNeXt Base with FPN detection head
Detection head weights are randomly initialized. Predicted bounding boxes
will be arbitrary until the detection head is trained on labelled data.
Only the backbone benefits from pretrained_backbone = TRUE.
Other object_detection_model:
model_facenet,
model_fasterrcnn,
model_maskrcnn
if (FALSE) { # \dontrun{
library(magrittr)
norm_mean <- c(0.485, 0.456, 0.406) # ImageNet normalization constants
norm_std <- c(0.229, 0.224, 0.225)
url <- paste0("https://upload.wikimedia.org/wikipedia/commons/thumb/",
"e/ea/Morsan_Normande_vache.jpg/120px-Morsan_Normande_vache.jpg")
img <- base_loader(url) %>%
transform_to_tensor() %>%
transform_resize(c(520, 520))
input <- img %>% transform_normalize(norm_mean, norm_std)
batch <- input$unsqueeze(1) # Add batch dimension: (1, 3, H, W)
# ConvNeXt Tiny detection
model <- model_convnext_tiny_detection(pretrained_backbone = TRUE)
model$eval()
# Please wait 2 mins + on CPU
pred <- model(batch)$detections[[1]]
num_boxes <- as.integer(pred$boxes$size()[1])
topk <- pred$scores$topk(k = 5)[[2]]
boxes <- pred$boxes[topk, ]
labels <- imagenet_classes(as.integer(pred$labels[topk]))
# `draw_bounding_box()` may fail if bbox values are not consistent.
if (num_boxes > 0) {
boxed <- draw_bounding_boxes(img, boxes, labels = labels)
tensor_image_browse(boxed)
}
} # }