1989: Neocognitron
The Neocognitron, pioneered by Fukushima, marked the inception of convolutional neural networks (CNNs). It introduced the concept of receptive fields, which emulate the visual cortex’s response to local stimuli.
1998: LeNet-5
LeCun’s LeNet-5 made significant strides in handwritten digit classification. It employed shared weights, pooling layers, and backpropagation, laying the foundation for modern CNN architectures.
2012: AlexNet
Krizhevsky et al.’s AlexNet revolutionized image classification. It featured multiple convolutional layers, dropout regularization, and ReLU activation, achieving state-of-the-art performance on ImageNet.
2014: VGGNet
VGGNet, developed by Simonyan and Zisserman, extended AlexNet with deeper convolutional layers. It demonstrated the benefits of increased depth in learning complex image features.
2015: GoogLeNet
Szegedy et al.’s GoogLeNet introduced inception modules, which allow parallel processing of multiple filters at different scales. It achieved superior accuracy and computational efficiency.
2016: ResNet
He et al.’s ResNet implemented skip connections, enabling information flow across layers. It addressed the vanishing gradient problem and facilitated training deeper networks with hundreds of layers.
2017: MobileNet
Howard et al.’s MobileNet introduced depthwise separable convolutions, reducing computational cost. It targeted mobile and embedded applications where resource constraints are crucial.
2018: EfficientNet
Tan and Le’s EfficientNet adopted a compound scaling approach, optimizing network depth, width, and resolution simultaneously. It achieved high accuracy while maintaining computational efficiency.
- ViT/Swin Transformer: Vision Transformers (ViTs) and Swin Transformers use self-attention mechanisms instead of convolutions. They have shown promise in large-scale image classification tasks.
- CNN-Transformer Hybrids: Combining CNNs and Transformers has led to advancements in image classification accuracy and explainability.
- Domain Adaptation: CNNs adapted to specific domains (e.g., medical imaging, remote sensing) enhance performance in specialized applications.
Recent Developments:
The evolution of CNNs for image classification has witnessed remarkable progress. From simple architectures to deep and efficient networks, CNNs continue to push the boundaries of computer vision capabilities. As research advances, we can expect further innovations and applications in this field.
Kind regards
J.O. Schneppat