zcc31415926.github.io

Discussion: Development of R-CNN and YOLO Object Detectors

The R-CNN series

Three models: R-CNN, Fast R-CNN and Faster R-CNN

R-CNN:

Selective search for region proposal
CNN encoder for region-wise feature extraction
SVM for feature (resized) classification
NN for box regression

Fast R-CNN (based on R-CNN):

feature-wise -> global feature extraction
ROI pooling: feature (resized) -> (ROI pooled) classification
joint training: CNN encoder & SVM & NN -> CNN for detection

Faster R-CNN (based on Fast R-CNN):

selective search -> region proposal network (RPN) for region proposal
feature extraction and RPN share the weights of the CNN encoder

The YOLO series

Three models: YOLO v1-3

YOLO v1:

images divided into $n\times n$ grids, corresponding to $n\times n$ objects
end-to-end training, outputting $n\times n$ vectors recording all attributes of every box

YOLO v2 (based on YOLO v1):

FC layers removed
anchor boxes (reference) introduced
K-means for anchor box generation
each grid corresponding to multiple boxes with multiple scales

YOLO v3 (based on YOLO v2):

division of positive and negative boxes
cross entropy -> binary cross entropy loss

References:

https://blog.csdn.net/u014380165/article/details/72851319

https://blog.csdn.net/weixin_43198141/article/details/90178512

https://www.jianshu.com/p/f87be68977cb