Image Classification of Human Epithelial Cells Using Deep Residual Learning

Abstract

Deep Learning in layman terms is stacking together multiple layers of neurons. These neurons lie on connected layers where the output from the previous layer is passed to each node in the next layer. The performance of neural networks has been optimized successfully with the help of the stacking of more layers. However, one problem commonly cited by researchers and professionals is that with deep networks composed of many dozens of layers, accuracy can become saturated, and some degradation can occur. And this is commonly called the ‘vanishing gradient,’ in which the gradient fluctuations become too small to be immediately useful. Batch normalization techniques have helped to decrease the problem of vanishing gradients by normalizing the activations of every layer to have zero unit variance and mean as per the statistics obtained after training per batch. But, the problem still arises when the layers go very deep.

The accurate classification of human epithelial (HEp-2) cells is essential because the automatic classification of HEp-2 cells with the help of a microscope image is a highly strenuous and challenging task. The reason behind this is because the illumination changes derived from the low contrasting cells. To address this problem, we propose a comparative study and see how different deep network algorithms and deep residual network algorithms work and how much they give accuracy.

Key Words— Deep residual networks, HEp-2 cells, Image classification

Introduction

The human epithelial cell (Hep-2) divides itself and produces a large number of antigens. Indirect immunofluorescence detection technology is used primarily for studying the antinuclear antibody ANA diagnosis, and treatment of autoimmune diseases. The human experts also use a fluorescence microscope to discriminate by visual examination against the nuclear antibody. Inhomogeneous illumination, however, leads to substantial intra-class

variations that challenge the classification of Hep-2 cells. In order to address this challenge, traditional methods mainly involve three steps: extraction of features, the encoding of features, and classification. However, these methods focus mainly on hand-crafted features, which suffer from limited classification performance. To further improve the potential for the representation of features to improve classification, it is crucial to effectively extract more discriminatory and informative features.

So we used three models for classification – VGG, ResNet50, and Alexnet. Each layer of the ResNet is made up of several blocks. And the reason is that when ResNets go more in-depth, they usually do it by increasing the number of operations within a block. Still, the number of total layers remains the same; one of the problems that ResNets solve is the famous known vanishing gradient. As the network becomes too large, the gradients from which the loss function becomes measured are reduced quickly to zero after many applications. This result on weights never updates its values, and thus no learning is carried out. With ResNets, the gradients flow back from later layers to initial filters directly through the skip connections.

Related Work and Literature Survey

The authors address in this paper [1] that deeper neural networks are more challenging to train. They provide a residual learning structure to promote the training of networks that are significantly deeper than those used previously. They directly reformulate layers as residual learning functions with respect to layer inputs instead of unreferenced learning functions. They provide detailed empiric evidence showing that these residual networks are easier to model and can benefit accuracy from substantially increased range. The ImageNet dataset tests residual nets with a depth of up to 152 layers going 8× deeper than VGG nets but is still less complex. The scope of representation is of central importance for many tasks of visual recognition. The combination of such residual nets achieves an error of 3.57 percent on the ImageNet test combination. They also present an overview of the CIAR-10 with 100 and 1000 layers. Only because of their intense representations do they achieve a 28 percent relative increase in the COCO object detection dataset.

In this paper [2], an automated HEp-2 cell classification method based on ResNet-50 was proposed by deeply-supervised cross-modal transfer learning networks. A deep supervision mechanism was implemented by attaching hidden layer features to auxiliary classifiers. This approach not only tackles the optimization problems of the ResNet-50 preparation but also strengthens the discriminatory ability of the network. The DSRN can effectively assist the ResNet-50 in the extraction of discriminatory features. The DSRN can efficiently assist ResNet-50 in the extraction of discriminatory functionality. We also suggest a cross-modal learning transfer strategy to accelerate network convergence, reduce computational complexity, and optimize network performance. Numerous studies have shown that their approach is superior to the conventional methods for classifying HEp-2 cells. They also compare different DSRN networks in order to evaluate the efficiency of classification. The success of the proposed method suggests a high potential for clinical diagnosis and lays the cornerstone for our future research.

Cell classification of HEp-2 is one of the most critical steps for the automatic diagnosis of autoimmune diseases. They [3] proposed a classification method which, unlike most state-of-the-art methods, uses an unsupervised deep-learning feature extraction system. They suggested the use of two deep convolutionary autoencoders. The first network uses the original cellular images as inputs and learns to recreate them in order to capture the features related to the global structure of the cells. The second network uses the gradients of the images as inputs and learns to recreate the original images in order to interpret local changes in the strength of the images provided by their gradient maps.

In order to improve valuable local pathology knowledge while removing less relevant elements [16], a new deep learning feature learning framework has been created to allow the development of a CD diagnostic assistance tool. The CNN-based approach is a pool of models that do not involve the feature selection process. In contrast, the proposed BCSE learning module provides a new alternative for the use of deep learning methods over the traditional machine learning algorithm to boost CD diagnostics. They extracted the dense vector features of ResNet50 and Inception-v3 embedded with SE, SCSE, and BCSE. Classical SVM (rbf), KNN, and LDA classifier models were then used to verify the use of the proposed methods. The findings revealed that block-wise channel compression and excitation effectively separates villous atrophy in CD from regulation at 95.94 percent accuracy and 97.20 percent and 95.63 percent sensitivity and specificity, respectively.

They have become more oriented on computer-aided CD diagnostics. Thanks to the unique imaging modality, the Scale-invariant feature transform (SIFT) is of interest and can be implemented into deep learning approaches for future research. However, deep learning methods have demonstrated essential benefits in the area of medical image processing. More efficient algorithms and datasets of greater size and complexity can be considered for the application of CD diagnostic technologies, rapid diagnosis, and care.

The findings presented in this research [15] allow us to conclude that for the classification of HEp-2 cell images, the majority of CNNs studied here achieved better findings when trained from scratch, using augmented datasets and without any other preprocessing strategy. Using 5-fold cross-validation, the best preprocessing approaches for CNN training from scratch were the use of original images with data increase for LeNet-5, AlexNet, and Inception-V3. In contrast, the best results for ResNet-50 were obtained through comparison, stretching with data increase. We tested CNN models equipped with the best preprocessing strategies while using k-fold cross-validation on the test range, and Inception-V3 obtained the best performance, with an accuracy of 98.28 percent, followed by VGG-16 with 96.82 percent, and ResNet-50 with 96.38 percent. LeNet-5 and AlexNet both reached the lowest accuracy of 94.35 percent.

Although the fine-tuning technique appears to be a viable method, it did not achieve higher accuracy values than the CNNs trained from scratch, particularly when the data increase was used. It may be used in situations with previously trained networks and where only a limited period of time is available for training, as fine-tuning techniques take less time to train unfrozen layers. The best results were obtained by using data augmentation and contrast stretching and average subtraction with a precision of 96.69 percent for the test set.

A variation of the proposed approaches could be helpful, allowing health agents to select the right approach for the automated classification of HEp-2 cells and to recognize autoimmune conditions by recognizing the diseases that cause them. This work also represents a significant phase in the study of various preprocessing techniques, as several CNN architectures were compared and the best result of 98.28 percent was very similar to the highest precision score reported in the literature; this result was obtained with Inception-V3 trained from scratch using augmented data without any other preprocessing technique.

Methodology

The primary and only purpose of this paper is to analyze and compare the efficiency of various types of CNNs to distinguish human epithelial type 2 (HEp-2) cells. Figure 1 demonstrates the methodology used here.

The dataset used here contains 1800 images of human epithelial cells, each of which can be classified into five classes – Homogeneous, Course Speckled, Fine Speckled, Nucleolar, and Centromere.

All the images of this dataset are resized to 224×224 pixels. The resizing was based on linear interpolation in two directions, as no information is removed or added as the images are resized. After this, images were preprocessed.

Four CNN architectures were used in this experiment and compared for their performance in the classification of images. These architectures are explained as follows:

AlexNet – AlexNet adopts drop-out connections to lessen overfit in the fully connected layers, and ReLU as an activation function that resides in the convolutional and fully connected layers. Figure 2 shows the architecture of the AlexNet model.
MiniVGG/VGG7 – VGGNet comprises of 7 convolutional layers and is very popular due to its very uniform architecture. Compared to AlexNet, there are just 3×3 convolutions, but many filters. It is right now the most preferred choice in the group to remove functionality from images. The weight configuration of the VGGNet is accessible to the public and has been used as a baseline feature extractor in many other applications and challenges. However, VGGNet is made up of 138 million parameters, which can be a bit difficult to handle. Figure 3 shows the architecture of the miniVGG model.
ResNet50 – This architecture has features like ‘skip connections’ and heavy batch normalization. These skip connections are also known as gated units or gated recurrent units and have substantial similarity to recent successful elements applied in RNNs. A residual neural network (ResNet) is an artificial neural network (ANN) of a kind that builds on constructs known from pyramidal cells. Residual neural networks do this by utilizing skip connections, or shortcuts to jump over some layers. Typical ResNet models are implemented with double- or triple- layer skips that contain nonlinearities (ReLU) and batch normalization in between.

Results and Discussion

This project is more focused on the comparison between deep residual learning and deep learning and proving why deep residual learning gives good accuracy as well as the least error rate. So, we investigate each and every model from scratch. TensorFlow, because of its enormous applications in research and production of deep learning, is used throughout the project, and the output is therefore obtained from it.

We have used three models in this project – VGG7, ResNet50, and Alexnet. AlexNet turned out in 2012 and was a progressive headway; it enhanced regular Convolutional Neural Networks (CNNs) and got perhaps the best model for classification of the image until VGG came out. VGGNet comprises of 7 convolutional layers and is exceptionally engaging because of its extremely uniform architecture. Similar to AlexNet, just 3×3 convolutions, however, have an enormous number of filters. It is at present the most favored model among computer vision experts for extricating highlights from pictures. The weight configuration of the VGGNet is openly accessible and has been utilized in numerous different applications and challenges. Be that as it may, VGGNet comprises of 138 million parameters, which can be somewhat testing to deal with sometimes. This architecture has functionalities called ‘skip connections’ and heavy batch normalization. These skip connections act like gated units or gated repetitive or recurrent units and have substantial similarity to recent RNNs. A residual neural network (ResNet) is an artificial neural network (ANN) of a sort that expands on builds known from pyramidal cells. Residual neural networks do this by utilizing skip connections, or shortcuts to hop over specific layers. Typical ResNet models are executed with a two-layer or triple-layer skip that contains nonlinearities (ReLU) and batch normalization in between.

Final Result

ResNet50 is found to the ideal model for classification of human epithelial cell images. If we compare, deep learning models with deep residual learning models, deep residual models give better results.

Conclusion

Among all the tried algorithms, we found that Resnet50 has the least loss rate and a good accuracy. For an image classification problem in the healthcare industry, it is imperative to have results having the least loss rate. VGG7 model performed with good accuracy but had a five times more loss rate concerning ResNet50, therefore even with reasonable accuracy, and we prefer a model with the least loss rate and a good accuracy. AlexNet neither showed good accuracy neither showed a less loss rate and also took the longest time to compute. Therefore, it is not an ideal model for the classification of human epithelial type-2 cells.

References

Deep Residual Learning for Image Recognition, by He, K., Ren, S., Sun, J., & Zhang, X. (2016).
Haijun Lei , Tao Han , Feng Zhou , Zhen Yu , Jing Qin , Ahmed Elazab ,Baiying Lei, A Deeply Supervised Residual Network for HEp-2 Cell ClassiÞcation via Cross-Modal Transfer Learning, Pattern Recognition (2018)
J. Xi, S. Linlin, Z. Xiande and Y. Shiqi, ‘Deep convolutional neural network-based HEp-2 cell classification,’ in 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 77-80.
M. Long, Y. Cao, J. Wang, and M. I. Jordan, ‘Learning Transferable Feature with Deep Adaptation Networks,’ Computer Science, pp. 97-105, 2015.
N. Tajbakhsh, J. Y. Shin, S. R. Gurudu, R. T. Hurst, C. B. Kendall, M. B. Gotway, et al., ‘Convolutional neural networks for medical image analysis: Full training or fine tuning?,’ IEEE Trans. Med. Imag., vol. 35, pp. 1299-1312, 2016.
H. T. H. Phan, A. K., J. Kim, D. Feng, ‘Transfer learning of a convolutional neural network for HEp-2 cell image classification’, International Symposium on Biomedical Imaging, pp. 1208-1211, 2016.
Z. Gao, J. Z., L. Zhou, L. Wang, ‘HEp-2 Cell Image Classification with Convolutional Neural Networks’ in Pattern Recognition Techniques for Indirect Immunofluorescence Images, pp. 24-28, 2014.
L. S. Xi Jia, Xiande Zhou, Shiqi Yu, ‘Deep convolutional neural network-based HEp-2 cell classification’, 23rd International Conference on Pattern Recognition (ICPR), pp. 77-80, 2016.
Y. Li, L. S., S. Yu, ‘HEp-2 Specimen Image Segmentation and Classification Using Very Deep Fully Convolutional Network’ in IEEE Transactions on Medical Imaging, 2017.
R. K. Srivastava, K. Greff, and J. Schmidhuber. Training very deep networks. 1507.06228, 2015.
A. Shahin, Y. Guo, K. Amin, and A. A. Sharawi, “White blood cells identification system based on convolutional deep neural learning networks,” Computer methods and programs in biomedicine, 2017
N. N. Sultana, B. Mandal, and N. Puhan, “Deep residual network with regularised fisher framework for detection of melanoma,” IET Computer Vision, vol. 12, no. 8, pp. 1096-1104, 2018.
Xinle Wang , Haiyang Qian , Edward J. Ciaccio , Suzanne K. Lewis , Govind Bhagat , Peter H. Green , Shenghao Xu , Liang Huang , Rongke Gao , Yu Liu , Celiac disease diagnosis from video capsule endoscopy images with residual learning and deep feature extraction, Computer Methods and Programs in Biomedicine (2019).
Sun, Lei. “ResNet on Tiny ImageNet.” (2017).
Ferreira Rodrigues, Larissa & Naldi, Murilo & Mari, Joao. (2019). Comparing convolutional neural networks and preprocessing techniques for HEp-2 cell classification in immunofluorescence images. Computers in Biology and Medicine. 116. 10.1016/j.compbiomed.2019.103542.
Xinle Wang , Haiyang Qian , Edward J. Ciaccio , Suzanne K. Lewis , Govind Bhagat , Peter H. Green , Shenghao Xu , Liang Huang , Rongke Gao , Yu Liu , Celiac disease diagnosis from video capsule endoscopy images with residual learning and deep feature extraction, Computer Methods and Programs in Biomedicine (2019)

Related Topics

Abstract

Introduction

Related Work and Literature Survey

Methodology

Results and Discussion

Final Result

Conclusion

References

Need custom essay sample written special for your assignment?

Related Topics

Image Classification of Human Epithelial Cells Using Deep Residual Learning

Abstract

Introduction

Related Work and Literature Survey

Methodology

Results and Discussion

Final Result

Conclusion

References

Need custom essay sample written special for your assignment?

More related essays