Fresh and Rotten Fruits - Product inspection with Convolutional Neural Networks
The objective of this analysis is to develop a deep learning model that can accurately classify images of fresh and rotten fruits. We explored three models: a baseline Convolutional Neural Network (CNN), an enhanced CNN with Batch Normalization, and a VGG16-based model with pretrained layers. The analysis aims to automate fruit quality assessment, which is traditionally done manually and is both labor-intensive and error-prone. Automation is expected to reduce costs and improve efficiency for stakeholders like fruit distributors and retailers. The study utilized a diverse and realistic dataset, including augmented images, to ensure robust performance across various real-world conditions. Among the models tested, the CNN with Batch Normalization using Augmented Data achieved the highest validation accuracy of 95 Percent, making it the best-performing model in this study. This model not only provided the best accuracy but also balanced precision and recall, making it the most reliable for distinguishing between fresh and rotten fruits. The model's success demonstrates the effectiveness of augmentation and Batch Normalization in improving model performance while maintaining computational efficiency, making it well-suited for deployment in real-world fruit quality assessment tasks.

1. Objective of the Analysis
The primary goal of this analysis is to develop a robust deep learning model that can accurately classify images of fresh and rotten fruits. We will utilize a Convolutional Neural Network (CNN) due to its proven effectiveness in image classification tasks as base model. We will then extend the base model by Batch Normalization layers as a second model. As third model we include a pretrained model VGG16 with pretrained layers extented by customized final layers. This analysis is intended to assist stakeholders, including fruit distributors and retailers, by automating the process of assessing fruit quality. This automation can significantly reduce labor costs and minimize the errors associated with manual inspections.
Value of the Analysis
- Current Challenges: Typically, the task of distinguishing fresh from rotten fruits is performed manually, which is not only inefficient but also costly for fruit farmers and vendors. Therefore, creating an automated classification model is essential to reduce human effort, lower costs, and expedite production in the agriculture industry.
- Comprehensive Scope: Compared to other datasets, this one includes a wider range of fruit classes, which can help researchers achieve better performance and more accurate classifications.
- Realistic Data: The images in this dataset were collected from various fruit markets and fields under natural weather conditions, which include varying lighting conditions. This diversity in data collection makes it challenging to identify defects with the naked eye, enhancing the dataset's realism.
- Economic Impact: Early detection of fruit spoilage, as facilitated by this research, could enable farmers to produce larger quantities of quality fruit, thereby contributing positively to the economy.
2. Dataset Description
Dataset Overview
Fruits play a critical role in the economic development of many countries, and consumers demand fresh, high-quality produce. Since fruits naturally degrade over time, this can negatively impact the economy. It's estimated that about one-third of all fruits become rotten, leading to significant financial losses. Furthermore, spoiled fruits can harm public perception and health, which can reduce sales. This dataset provides a foundation for developing algorithms aimed at the early detection and classification of fresh versus rotten fruits, which is vital for mitigating these issues in the agricultural sector.
This dataset includes an extensive collection of fruit images across sixteen categories, including fresh and rotten variants of apples, bananas, oranges, grapes, guavas, jujubes, pomegranates, and strawberries. In total, there are 3,200 original images and an additional 12,335 augmented images. All images are uniformly sized at 512 × 512 pixels.
Citation:
Sultana, Nusrat; Jahan, Musfika; Uddin, Mohammad Shorif (2022), “Fresh and Rotten Fruits Dataset for Machine-Based Evaluation of Fruit Quality”, Mendeley Data, V1, doi: 10.17632/bdd69gyhv8.1.
Dataset Summary
Category | No. of Original Images | No. of Augmented Images |
---|---|---|
Fresh Apple | 200 | 734 |
Rotten Apple | 200 | 738 |
Fresh Banana | 200 | 740 |
Rotten Banana | 200 | 736 |
Fresh Orange | 200 | 796 |
Rotten Orange | 200 | 796 |
Fresh Grape | 200 | 800 |
Rotten Grape | 200 | 746 |
Fresh Guava | 200 | 797 |
Rotten Guava | 200 | 797 |
Fresh Jujube | 200 | 793 |
Rotten Jujube | 200 | 793 |
Fresh Pomegranate | 200 | 797 |
Rotten Pomegranate | 200 | 798 |
Fresh Strawberry | 200 | 737 |
Rotten Strawberry | 200 | 737 |
Total | 3,200 | 12,335 |
Description of Fruit Classes
Here are brief descriptions of the covered fruit classes:
Fruit Type | Fresh Condition | Rotten Condition |
---|---|---|
Apple | Firm, crisp flesh with bright, unblemished skin. Pleasant, sweet aroma. | Brown, mushy flesh. Skin may show signs of mold or discoloration. Sour or fermented smell. |
Banana | Curved with thick peel, ranging from green (unripe) to yellow with brown spots (ripe). Firm and sweet. | Mushy texture, peel is overly soft and dark brown to black. Strong, overripe smell, possible mold. |
Orange | Firm, smooth peel. Heavy for size. Juicy and sweet flesh, bright orange color. | Soft, squishy texture. Peel shows mold or dark spots. Flesh may be discolored, sour odor. |
Grape | Plump and firm with vibrant colors (green, red, purple). Stems are green and flexible. | Shriveled, sticky, or moldy. Stems dry, grapes may smell sour or vinegar-like. |
Guava | Firm with slightly rough outer skin, color can be green, yellow, or maroon. Aromatic scent. | Brown or black spots inside, skin may split or show fungal infection. Flesh becomes soft and discolored. |
Jujube | Green when unripe, turning yellow-green with red-brown patches as they mature. Crisp texture. | Dark spots that enlarge and deepen in color. Fruit becomes sunken, with thickened skin and soft, discolored flesh. |
Pomegranate | Firm with thick, vibrant red or purple skin. Feels heavy, indicating juiciness. | Cracks in skin, mold growth, internal decay. Seeds discolored, unpleasant odor. |
Strawberry | Bright red with glossy surface and green caps. Firm but juicy, sweet fragrance. | Soft, mushy, often with visible mold. Off smell, color becomes dull and brownish. |
3. Data Exploration and Preprocessing
The dataset is well-balanced across the various fruit classes, which is beneficial for developing a reliable model. As we analyse images there are no further adjustments like for missing values or outliers needed. Below are some sample images from the dataset to illustrate the variety and quality of the data.

Sample pictures of fresh and rotten fruits in the dataset
4. Model Training
Model Selection and Training
We will experiment with three variations of a Convolutional Neural Network (CNN):
- Baseline Model: A simple CNN with three convolutional layers.
- Enhanced Model: A CNN with three convolutional layers, Batch Normalization layers and dropout for regularization.
- VGG16 Pretrained Model: A CNN model based on the ImageNet pretrained VGG16, customized with a final Dense layer and an Output layer.
Each model is trained and evaluated for the Original Dataset and the augmented data set. These Image datasets are split in 80% training and 20% test data used vor validation.
CNN Baseline Model
The CNN Baseline Model consists of three Convolutional Blocks each with a Convolutional Layer and a Pooling layer. The total number of parameters is 4,762,928 from which all are trainable.
The table below summarizes the parameters in each layer, the architcture is hown in Appendix 1.
Layer Name | Param Count | Trainable Params | Non-trainable Params | |
---|---|---|---|---|
0 | Conv | 448 | 448 | 0 |
1 | Pooling | 0 | 0 | 0 |
2 | Conv | 4640 | 4640 | 0 |
3 | Pooling | 0 | 0 | 0 |
4 | Conv | 18496 | 18496 | 0 |
5 | Pooling | 0 | 0 | 0 |
6 | Flatten | 0 | 0 | 0 |
7 | Dense | 4735232 | 4735232 | 0 |
8 | Output | 4112 | 4112 | 0 |
9 | Total | 4762928 | 4762928 | 0 |
Below you can find the results for the original and augmented Dataset after 20 Epochs of training and a batch size of 20. Both scenarios show improved performance towards the end of the epochs. No signs of overfitting can be detected. The use of the larger augmented dataset results in improved performance.

Accuracy for CNN Basemodel

Loss for CNN Basemodel
Accuracy | Precision | Recall | F1Score | AUC | |
---|---|---|---|---|---|
Epoch | |||||
1 | 0.643750 | 0.756813 | 0.564062 | 0.646374 | 0.965654 |
2 | 0.696875 | 0.787321 | 0.601562 | 0.682019 | 0.970222 |
3 | 0.756250 | 0.784615 | 0.717188 | 0.749388 | 0.972299 |
4 | 0.795313 | 0.818627 | 0.782812 | 0.800319 | 0.980803 |
5 | 0.793750 | 0.795455 | 0.765625 | 0.780255 | 0.970285 |
6 | 0.810938 | 0.821018 | 0.781250 | 0.800641 | 0.973026 |
7 | 0.809375 | 0.826299 | 0.795313 | 0.810510 | 0.973802 |
8 | 0.804688 | 0.814992 | 0.798437 | 0.806630 | 0.966704 |
9 | 0.781250 | 0.790143 | 0.776563 | 0.783294 | 0.956404 |
10 | 0.751562 | 0.768233 | 0.740625 | 0.754177 | 0.948303 |
11 | 0.809375 | 0.823151 | 0.800000 | 0.811410 | 0.963672 |
12 | 0.835938 | 0.840190 | 0.829687 | 0.834906 | 0.966890 |
13 | 0.846875 | 0.852381 | 0.839063 | 0.845669 | 0.964101 |
14 | 0.857813 | 0.858491 | 0.853125 | 0.855799 | 0.964354 |
15 | 0.850000 | 0.853543 | 0.846875 | 0.850196 | 0.964391 |
16 | 0.853125 | 0.856240 | 0.846875 | 0.851532 | 0.962202 |
17 | 0.854688 | 0.858491 | 0.853125 | 0.855799 | 0.961562 |
18 | 0.853125 | 0.855118 | 0.848437 | 0.851765 | 0.961606 |
19 | 0.851562 | 0.855573 | 0.851562 | 0.853563 | 0.962333 |
20 | 0.851562 | 0.855118 | 0.848437 | 0.851765 | 0.961671 |
CNN Basemodel | |
---|---|
Epoch | 14.000000 |
Accuracy | 0.857813 |
Precision | 0.858491 |
Recall | 0.853125 |
F1Score | 0.855799 |
AUC | 0.964354 |

Accuracy for CNN Basemodel with Augmented Dataset

Loss for CNN Basemodel with Augmented Dataset
Accuracy | Precision | Recall | F1Score | AUC | |
---|---|---|---|---|---|
Epoch | |||||
1 | 0.777733 | 0.826967 | 0.730191 | 0.775572 | 0.984367 |
2 | 0.806583 | 0.827855 | 0.777733 | 0.802011 | 0.989519 |
3 | 0.835839 | 0.852225 | 0.824868 | 0.838323 | 0.986587 |
4 | 0.880536 | 0.891205 | 0.868753 | 0.879835 | 0.988442 |
5 | 0.874441 | 0.884520 | 0.868346 | 0.876358 | 0.987661 |
6 | 0.911012 | 0.918275 | 0.908574 | 0.913399 | 0.987672 |
7 | 0.880943 | 0.887974 | 0.876067 | 0.881980 | 0.984029 |
8 | 0.906136 | 0.911632 | 0.901260 | 0.906416 | 0.989616 |
9 | 0.901666 | 0.907749 | 0.899634 | 0.903673 | 0.987981 |
10 | 0.868753 | 0.877627 | 0.865502 | 0.871522 | 0.986445 |
11 | 0.913856 | 0.916395 | 0.913043 | 0.914716 | 0.985344 |
12 | 0.905323 | 0.908497 | 0.903698 | 0.906091 | 0.984169 |
13 | 0.903291 | 0.908979 | 0.900853 | 0.904898 | 0.985019 |
14 | 0.902072 | 0.908121 | 0.899634 | 0.903858 | 0.985156 |
15 | 0.892320 | 0.896975 | 0.891508 | 0.894233 | 0.983038 |
16 | 0.901260 | 0.907005 | 0.899634 | 0.903305 | 0.982904 |
17 | 0.912637 | 0.915137 | 0.911418 | 0.913274 | 0.982341 |
18 | 0.920764 | 0.921856 | 0.920358 | 0.921106 | 0.983386 |
19 | 0.885819 | 0.890389 | 0.884600 | 0.887485 | 0.977844 |
20 | 0.917920 | 0.920343 | 0.915482 | 0.917906 | 0.984839 |
CNN Augmented Data | |
---|---|
Epoch | 18.000000 |
Accuracy | 0.920764 |
Precision | 0.921856 |
Recall | 0.920358 |
F1Score | 0.921106 |
AUC | 0.983386 |
CNN with Batch Normalization
The secon CNN Model adds Batch Normalization and Dropout layers to the Baseline Model. The total number of parameters is slightly increased to 4,764,400 from which all are trainable.
The table below summarizes the parameters in each layer, the architcture is hown in Appendix 2.
Layer Name | Param Count | Trainable Params | Non-trainable Params | |
---|---|---|---|---|
0 | conv2d | 448 | 448 | 0 |
1 | batch_normalization | 64 | 32 | 32 |
2 | max_pooling2d | 0 | 0 | 0 |
3 | dropout | 0 | 0 | 0 |
4 | conv2d_1 | 4640 | 4640 | 0 |
5 | batch_normalization_1 | 128 | 64 | 64 |
6 | max_pooling2d_1 | 0 | 0 | 0 |
7 | dropout_1 | 0 | 0 | 0 |
8 | conv2d_2 | 18496 | 18496 | 0 |
9 | batch_normalization_2 | 256 | 128 | 128 |
10 | max_pooling2d_2 | 0 | 0 | 0 |
11 | dropout_2 | 0 | 0 | 0 |
12 | flatten | 0 | 0 | 0 |
13 | dense | 4735232 | 4735232 | 0 |
14 | batch_normalization_3 | 1024 | 512 | 512 |
15 | dropout_3 | 0 | 0 | 0 |
16 | dense_1 | 4112 | 4112 | 0 |
17 | Total | 4764400 | 4763664 | 736 |
Below you can find the results for the original and augmented Dataset after 20 Epochs of training and a batch size of 20. Both scenarios show improved performance towards the end of the epochs. No signs of overfitting can be detected. As with the previous model the use of the larger augmented dataset results in improved performance.

Accuracy Loss for CNN Batch Normalization

Loss for CNN Batch Normalization
Accuracy | Precision | Recall | F1Score | AUC | |
---|---|---|---|---|---|
Epoch | |||||
1 | 0.073437 | 0.073437 | 0.073437 | 0.073437 | 0.520742 |
2 | 0.100000 | 0.108257 | 0.092188 | 0.099578 | 0.588592 |
3 | 0.368750 | 0.425358 | 0.325000 | 0.368468 | 0.840214 |
4 | 0.314063 | 0.336502 | 0.276563 | 0.303602 | 0.786681 |
5 | 0.723437 | 0.760908 | 0.681250 | 0.718879 | 0.966903 |
6 | 0.709375 | 0.737024 | 0.665625 | 0.699507 | 0.953470 |
7 | 0.487500 | 0.518707 | 0.476562 | 0.496743 | 0.819383 |
8 | 0.543750 | 0.592672 | 0.429688 | 0.498188 | 0.865377 |
9 | 0.573438 | 0.584844 | 0.554688 | 0.569366 | 0.896289 |
10 | 0.853125 | 0.868932 | 0.839063 | 0.853736 | 0.986776 |
11 | 0.746875 | 0.762684 | 0.728125 | 0.745004 | 0.959929 |
12 | 0.465625 | 0.497355 | 0.440625 | 0.467274 | 0.849185 |
13 | 0.720312 | 0.744646 | 0.706250 | 0.724940 | 0.936291 |
14 | 0.753125 | 0.771987 | 0.740625 | 0.755981 | 0.957774 |
15 | 0.560938 | 0.579565 | 0.540625 | 0.559418 | 0.848984 |
16 | 0.696875 | 0.710824 | 0.687500 | 0.698967 | 0.936877 |
17 | 0.859375 | 0.874799 | 0.851562 | 0.863025 | 0.985133 |
18 | 0.782812 | 0.807566 | 0.767187 | 0.786859 | 0.969630 |
19 | 0.803125 | 0.819063 | 0.792188 | 0.805401 | 0.962914 |
20 | 0.834375 | 0.855519 | 0.823438 | 0.839172 | 0.979792 |
CNN Batch Normalized | |
---|---|
Epoch | 17.000000 |
Accuracy | 0.859375 |
Precision | 0.874799 |
Recall | 0.851562 |
F1Score | 0.863025 |
AUC | 0.985133 |

Accuracy for CNN Batch Normalization with Augmented Dataset

Loss for CNN Batch Normalization with Augmented Dataset
Accuracy | Precision | Recall | F1Score | AUC | |
---|---|---|---|---|---|
Epoch | |||||
1 | 0.222674 | 0.239925 | 0.208046 | 0.222851 | 0.702998 |
2 | 0.460382 | 0.480301 | 0.440878 | 0.459746 | 0.793189 |
3 | 0.657863 | 0.691719 | 0.624543 | 0.656417 | 0.931353 |
4 | 0.777733 | 0.806565 | 0.748883 | 0.776654 | 0.965804 |
5 | 0.707030 | 0.733131 | 0.688744 | 0.710245 | 0.931503 |
6 | 0.789923 | 0.805731 | 0.776920 | 0.791063 | 0.956580 |
7 | 0.222674 | 0.241315 | 0.211703 | 0.225541 | 0.645281 |
8 | 0.820398 | 0.834238 | 0.811865 | 0.822900 | 0.975151 |
9 | 0.778952 | 0.791457 | 0.767981 | 0.779542 | 0.966592 |
10 | 0.706623 | 0.729149 | 0.689151 | 0.708586 | 0.938812 |
11 | 0.873222 | 0.889537 | 0.867127 | 0.878189 | 0.984731 |
12 | 0.865908 | 0.872122 | 0.861845 | 0.866953 | 0.984704 |
13 | 0.916701 | 0.922446 | 0.913450 | 0.917926 | 0.991996 |
14 | 0.826493 | 0.843384 | 0.818367 | 0.830687 | 0.972769 |
15 | 0.915075 | 0.921117 | 0.911012 | 0.916037 | 0.989326 |
16 | 0.891914 | 0.898396 | 0.887444 | 0.892886 | 0.989322 |
17 | 0.887850 | 0.898713 | 0.879724 | 0.889117 | 0.987485 |
18 | 0.811459 | 0.821473 | 0.802113 | 0.811678 | 0.971168 |
19 | 0.837058 | 0.847395 | 0.832588 | 0.839926 | 0.974790 |
20 | 0.952458 | 0.958882 | 0.947582 | 0.953198 | 0.996560 |
CNN Batch Normalized Augmented Data | |
---|---|
Epoch | 20.000000 |
Accuracy | 0.952458 |
Precision | 0.958882 |
Recall | 0.947582 |
F1Score | 0.953198 |
AUC | 0.996560 |
VGG16 Pretrained Model
The third mmodel is based on the pretrained VGG16. VGG16 is a deep convolutional neural network model that was trained on the ImageNet dataset, which contains millions of images across 1,000 classes. The model has 16 layers, including 13 convolutional layers, that are designed to extract hierarchical features from images, making it highly effective for various image classification tasks. Pretrained VGG16 on ImageNet is commonly used for transfer learning, allowing the model to be fine-tuned for specific tasks with relatively small amounts of new data. For customization we added 4 layers, a Pooling Layer, a Dense Layer an additional Dropout layer and the final Output layer. The total number of layers is 14,985,552 from which only 270,864 are trainable.
The table below summarizes the parameters in each layer, the architcture is hown in Appendix 3.
Layer Name | Param Count | Trainable Params | Non-trainable Params | |
---|---|---|---|---|
0 | input_layer_2 | 0 | 0 | 0 |
1 | block1_conv1 | 1792 | 0 | 1792 |
2 | block1_conv2 | 36928 | 0 | 36928 |
3 | block1_pool | 0 | 0 | 0 |
4 | block2_conv1 | 73856 | 0 | 73856 |
5 | block2_conv2 | 147584 | 0 | 147584 |
6 | block2_pool | 0 | 0 | 0 |
7 | block3_conv1 | 295168 | 0 | 295168 |
8 | block3_conv2 | 590080 | 0 | 590080 |
9 | block3_conv3 | 590080 | 0 | 590080 |
10 | block3_pool | 0 | 0 | 0 |
11 | block4_conv1 | 1180160 | 0 | 1180160 |
12 | block4_conv2 | 2359808 | 0 | 2359808 |
13 | block4_conv3 | 2359808 | 0 | 2359808 |
14 | block4_pool | 0 | 0 | 0 |
15 | block5_conv1 | 2359808 | 0 | 2359808 |
16 | block5_conv2 | 2359808 | 0 | 2359808 |
17 | block5_conv3 | 2359808 | 0 | 2359808 |
18 | block5_pool | 0 | 0 | 0 |
19 | global_average_pooling2d | 0 | 0 | 0 |
20 | dense_2 | 262656 | 262656 | 0 |
21 | dropout_4 | 0 | 0 | 0 |
22 | dense_3 | 8208 | 8208 | 0 |
23 | Total | 14985552 | 270864 | 14714688 |
Below you can find the results for the original and augmented Dataset after 20 Epochs of training and a batch size of 20. Both scenarios show improved performance towards the end of the epochs. No signs of overfitting can be detected. As with the previous two model the use of the larger augmented dataset results in improved performance.

Figure 10: Accuracy for VGG Pretrained

Loss for VGG Pretrained
Accuracy | Precision | Recall | F1Score | AUC | |
---|---|---|---|---|---|
Epoch | |||||
1 | 0.640625 | 0.878307 | 0.259375 | 0.400483 | 0.967135 |
2 | 0.729688 | 0.848485 | 0.481250 | 0.614158 | 0.977944 |
3 | 0.810938 | 0.902655 | 0.637500 | 0.747253 | 0.988430 |
4 | 0.828125 | 0.912525 | 0.717188 | 0.803150 | 0.991159 |
5 | 0.845312 | 0.908571 | 0.745313 | 0.818884 | 0.991934 |
6 | 0.843750 | 0.896552 | 0.771875 | 0.829555 | 0.992498 |
7 | 0.854688 | 0.903114 | 0.815625 | 0.857143 | 0.992099 |
8 | 0.851562 | 0.892734 | 0.806250 | 0.847291 | 0.993120 |
9 | 0.853125 | 0.887564 | 0.814062 | 0.849226 | 0.992063 |
10 | 0.879687 | 0.913706 | 0.843750 | 0.877336 | 0.995132 |
11 | 0.895312 | 0.927973 | 0.865625 | 0.895715 | 0.996097 |
12 | 0.892187 | 0.914430 | 0.851562 | 0.881877 | 0.994368 |
13 | 0.884375 | 0.905941 | 0.857813 | 0.881220 | 0.994542 |
14 | 0.862500 | 0.891847 | 0.837500 | 0.863819 | 0.994917 |
15 | 0.896875 | 0.927512 | 0.879687 | 0.902967 | 0.996260 |
16 | 0.867188 | 0.882068 | 0.853125 | 0.867355 | 0.994096 |
17 | 0.898438 | 0.912052 | 0.875000 | 0.893142 | 0.995553 |
18 | 0.881250 | 0.898223 | 0.868750 | 0.883241 | 0.996280 |
19 | 0.900000 | 0.914928 | 0.890625 | 0.902613 | 0.996317 |
20 | 0.870313 | 0.888889 | 0.862500 | 0.875496 | 0.995460 |
VGG Pretrained | |
---|---|
Epoch | 19.000000 |
Accuracy | 0.900000 |
Precision | 0.914928 |
Recall | 0.890625 |
F1Score | 0.902613 |
AUC | 0.996317 |

Accuracy for VGG Pretrained with Augmented Dataset

Loss for VGG Pretrained with Augmented Dataset
Accuracy | Precision | Recall | F1Score | AUC | |
---|---|---|---|---|---|
Epoch | |||||
1 | 0.817554 | 0.920379 | 0.671678 | 0.776603 | 0.989430 |
2 | 0.871191 | 0.914874 | 0.812271 | 0.860525 | 0.994493 |
3 | 0.886225 | 0.921724 | 0.851686 | 0.885322 | 0.995999 |
4 | 0.895571 | 0.925315 | 0.865908 | 0.894626 | 0.996805 |
5 | 0.919545 | 0.938004 | 0.897603 | 0.917359 | 0.997806 |
6 | 0.915888 | 0.937025 | 0.900853 | 0.918583 | 0.997433 |
7 | 0.915888 | 0.934516 | 0.898822 | 0.916321 | 0.997447 |
8 | 0.921577 | 0.936762 | 0.902885 | 0.919512 | 0.997131 |
9 | 0.928078 | 0.941962 | 0.916701 | 0.929160 | 0.997956 |
10 | 0.931329 | 0.945878 | 0.923202 | 0.934403 | 0.996902 |
11 | 0.934986 | 0.947566 | 0.925234 | 0.936266 | 0.998272 |
12 | 0.928891 | 0.939469 | 0.920764 | 0.930023 | 0.996794 |
13 | 0.939049 | 0.947521 | 0.931735 | 0.939562 | 0.996833 |
14 | 0.943113 | 0.952066 | 0.936205 | 0.944069 | 0.997300 |
15 | 0.943113 | 0.948971 | 0.937018 | 0.942956 | 0.996994 |
16 | 0.936611 | 0.942387 | 0.930516 | 0.936414 | 0.996043 |
17 | 0.943519 | 0.950454 | 0.935392 | 0.942863 | 0.997393 |
18 | 0.939862 | 0.944856 | 0.932954 | 0.938867 | 0.997261 |
19 | 0.945144 | 0.953048 | 0.940268 | 0.946615 | 0.996702 |
20 | 0.939862 | 0.943535 | 0.937018 | 0.940265 | 0.997226 |
VGG Pretrained Augmented Data | |
---|---|
Epoch | 19.000000 |
Accuracy | 0.945144 |
Precision | 0.953048 |
Recall | 0.940268 |
F1Score | 0.946615 |
AUC | 0.996702 |
5. Model Evaluation and Recommendation
Model Evaluation
Each model's performance was evaluated using accuracy, precision, recall, F1 score and AUC. The table below summarizes the results:
Epoch | Accuracy | Precision | Recall | F1Score | AUC | |
---|---|---|---|---|---|---|
CNN Basemodel | 14.0 | 0.857813 | 0.858491 | 0.853125 | 0.855799 | 0.964354 |
CNN Augmented Data | 18.0 | 0.920764 | 0.921856 | 0.920358 | 0.921106 | 0.983386 |
CNN Batch Normalized | 17.0 | 0.859375 | 0.874799 | 0.851562 | 0.863025 | 0.985133 |
CNN Batch Normalized Augmented Data | 20.0 | 0.952458 | 0.958882 | 0.947582 | 0.953198 | 0.996560 |
VGG Pretrained | 19.0 | 0.900000 | 0.914928 | 0.890625 | 0.902613 | 0.996317 |
VGG Pretrained Augmented Data | 19.0 | 0.945144 | 0.953048 | 0.940268 | 0.946615 | 0.996702 |
Model Recommendation
The CNN Batch Normalized Model with Augmented Data shows the best validation accuracy and also shows great performance for all other metrics including a balance between precision and recall. This model is well-suited for the task of classifying fresh and rotten fruits.
We show the resulting confusion matrix for the augmented dataset and a validation size of 2461 images.

Confusion matrix for CNN Batch Normalized Model with Augmented Data
The Confusion matrix confirms the high accuracy of the model. The only outlier would be the 31 cases of Rotten Apples which were classified as Rotten Pomegranates. Rotten Oranges on the other hand were perfectly classified.
Below we present 15 examples of misclassification, one for each class of fruit (except for Rotten Oranges as mentioned without misclassification).

Examples of misclassification for CNN Batch Normalized Model with Augmented Data
We can summarize the performance for each class as shown in the Classification Report Heatmap below. Rotten Oranges have a perfect Recall value. Perfect Precision values can be found for Rotten Guavas, Rotten Bananas and Rotten Strawberries.

Classification Report for CNN Batch Normalized Model with Augmented Data
6. Key Findings and Insights
Summary of Findings
- The CNN Batch Normalized Model with Augmented Data achieved the highest accuracy of 95%, making it the most effective model for this classification task.
- Data Augmentation played a crucial role in enhancing model performance by introducing more diversity in the training data.
- The Baseline Model provided a solid starting point but was outperformed by the more complex models.
- Enhanced Model showed improved results, demonstrating the benefits of Batch Normalization architectures.
- Overall, the CNN Batch Normalized Model with Augmented Data strikes a good balance between accuracy and model complexity, making it the optimal choice for this application.
- The Pretrained VGG Model with Augmented Data shows similar performance and could be considered as alternative, while being more complex and computational intensive.
Insights
- Impact of Augmentation: The application of data augmentation techniques significantly improved the model's generalization capability, suggesting that further exploration of augmentation strategies could yield even better results.
- Model Complexity: While increasing the depth of the network improved performance, it also increased training time and computational cost, highlighting the trade-offs involved in model selection.
- Future Directions: Exploring more advanced architectures such as ResNet or using ensemble methods could potentially lead to further improvements in classification accuracy.
7. Conclusion and Next Steps
Conclusion
This analysis demonstrates the effectiveness of using a Convolutional Neural Network (CNN) for classifying fruit quality based on images. Among the models tested, the CNN Batch Normalized Model with Augmented Data stands out as the most reliable, achieving the highest accuracy and offering a balanced performance in terms of precision and recall. This model is well-suited for the task of distinguishing between fresh and rotten fruits and can serve as a robust solution for automated fruit quality assessment.
Suggestions for Future Work
- Explore Additional Data: Incorporating more fruit varieties or expanding the dataset with additional images could enhance the model’s robustness and generalizability.
- Experiment with Advanced Architectures: Testing more complex architectures such as ResNet, Inception, or DenseNet might yield further improvements in classification accuracy and model performance.
- Ensemble Methods: Investigating the use of ensemble methods, where multiple models are combined, could potentially increase the reliability of the predictions.
- Real-World Application: Consider deploying the model in a real-world setting to automate the fruit quality assessment process in retail or agricultural environments. This could involve integrating the model into an existing supply chain management system.
- Optimize for Deployment: Further work could focus on optimizing the model for deployment, including reducing its size and improving inference speed without sacrificing accuracy.
Appendix

CNN Basemodel Layers

CNN Batch Normalization Model Layers

Appendix 3:VGG Pretrained Model Layers