Discussion: Development of R-CNN and YOLO Object Detectors
The R-CNN series
Three models: R-CNN, Fast R-CNN and Faster R-CNN
R-CNN:
- Selective search for region proposal
- CNN encoder for region-wise feature extraction
- SVM for feature (resized) classification
- NN for box regression
Fast R-CNN (based on R-CNN):
- feature-wise -> global feature extraction
- ROI pooling: feature (resized) -> (ROI pooled) classification
- joint training: CNN encoder & SVM & NN -> CNN for detection
Faster R-CNN (based on Fast R-CNN):
- selective search -> region proposal network (RPN) for region proposal
- feature extraction and RPN share the weights of the CNN encoder
The YOLO series
Three models: YOLO v1-3
YOLO v1:
- images divided into $n\times n$ grids, corresponding to $n\times n$ objects
- end-to-end training, outputting $n\times n$ vectors recording all attributes of every box
YOLO v2 (based on YOLO v1):
- FC layers removed
- anchor boxes (reference) introduced
- K-means for anchor box generation
- each grid corresponding to multiple boxes with multiple scales
YOLO v3 (based on YOLO v2):
- division of positive and negative boxes
- cross entropy -> binary cross entropy loss
References:
https://blog.csdn.net/u014380165/article/details/72851319
https://blog.csdn.net/weixin_43198141/article/details/90178512
https://www.jianshu.com/p/f87be68977cb