TEJAS Journal of Technologies and Humanitarian Science

ISSN : 2583-5599

Open Access | Quarterly | Peer Reviewed Journal

October, 2025 | Volume 04 | Issue 04

Paper 9: A Hybrid Faster R-CNN and YOLOv5 Model with Transformer Augmentation for Enhanced Object Detection

Authors : Kunal Sahu, Khushi Rajput, Shweta Sinha and Rinku Raheja

Doi: https://doi.org/10.63920/tjths.44009

Abstract

Our proposal includes a three-step model to identify small-scale objects less than 32x32 pixels, e.g., backpacks, handbags, or other discarded items in a security camera image. We initially determine potential boxes with YOLOv5. Then we fine-tune those boxes with Faster R-CNN to achieve more precise results. We now include a small Transformer decoder to detect smaller objects. We will prune the model using weight pruning and INT8 quantization, and will make the size of the model smaller by 20-30%, and targeting 20-30 frames per second on a Jetson Nano to make it executable in real time. Our training will be done on mixed precise on a custom surveillance set which concentrates on small things. We are aiming to make the recall of small objects exceed the 30% baseline by YOLOv5 with obvious benefits in autonomous car, smart security, and farm monitoring applications. The model will subsequently be tested on our set by running the model later, testing it on COCO and KITTI, and testing its ability to work with video streams.

Download Full PDF Paper