COVID-19 Chest X-ray Classification using CNNs
This is a Machine Learning project under the broader field of Data Science, aimed at building a Convolutional Neural Network (CNN) model to classify COVID-19 and non-COVID chest X-ray images. It demonstrates my ability to preprocess medical image data, develop and optimize deep learning architectures, and interpret model performance for real-world healthcare applications.
Organizer
Universitas Gadjah Mada
Role
Data Scientist / Machine Learning Engineer
Partner
Solo Project
Client
Academic Project or Internal Research
Intro




Understanding the Diagnostic Challenge
Due to the COVID-19 outbreak, chest X-rays have become vital for fast diagnosis. This project builds a CNN model to classify four types of lung conditions from X-ray images. The goal is to support early detection and differentiation of respiratory diseases efficiently.
Understanding the Diagnostic Challenge
Due to the COVID-19 outbreak, chest X-rays have become vital for fast diagnosis. This project builds a CNN model to classify four types of lung conditions from X-ray images. The goal is to support early detection and differentiation of respiratory diseases efficiently.
Understanding the Diagnostic Challenge
Due to the COVID-19 outbreak, chest X-rays have become vital for fast diagnosis. This project builds a CNN model to classify four types of lung conditions from X-ray images. The goal is to support early detection and differentiation of respiratory diseases efficiently.
Dataset
Source: COVID-19 Radiography Database
3 Classes:
COVID-19,Normal,Viral PneumoniaSize: 3616 COVID, 10,192 Normal, 1345 Viral Pneumonia images
Dataset
Source: COVID-19 Radiography Database
3 Classes:
COVID-19,Normal,Viral PneumoniaSize: 3616 COVID, 10,192 Normal, 1345 Viral Pneumonia images
Dataset
Source: COVID-19 Radiography Database
3 Classes:
COVID-19,Normal,Viral PneumoniaSize: 3616 COVID, 10,192 Normal, 1345 Viral Pneumonia images
Model Architecture (CNN)
Input (150x150x3) ↓ Conv2D (32 filters) + MaxPooling ↓ Conv2D (64 filters) + MaxPooling ↓ Flatten ↓ Dense (128) + ReLU ↓ Dropout (0.5) ↓ Dense (3) with Softmax
Model Architecture (CNN)
Input (150x150x3) ↓ Conv2D (32 filters) + MaxPooling ↓ Conv2D (64 filters) + MaxPooling ↓ Flatten ↓ Dense (128) + ReLU ↓ Dropout (0.5) ↓ Dense (3) with Softmax
Model Architecture (CNN)
Input (150x150x3) ↓ Conv2D (32 filters) + MaxPooling ↓ Conv2D (64 filters) + MaxPooling ↓ Flatten ↓ Dense (128) + ReLU ↓ Dropout (0.5) ↓ Dense (3) with Softmax
Improvements from Original Experiment
Aspect | Original | My Version |
|---|---|---|
Dataset Handling | Raw zip used as-is | Cleaned & verified |
Augmentation | May be absent | ✅ Added |
Overfitting Prevention | Not handled | ✅ Dropout used |
Evaluation | Accuracy only | ✅ Accuracy + Loss + Graphs |
Visualization | Minimal | ✅ Informative plots |
Improvements from Original Experiment
Aspect | Original | My Version |
|---|---|---|
Dataset Handling | Raw zip used as-is | Cleaned & verified |
Augmentation | May be absent | ✅ Added |
Overfitting Prevention | Not handled | ✅ Dropout used |
Evaluation | Accuracy only | ✅ Accuracy + Loss + Graphs |
Visualization | Minimal | ✅ Informative plots |
Improvements from Original Experiment
Aspect | Original | My Version |
|---|---|---|
Dataset Handling | Raw zip used as-is | Cleaned & verified |
Augmentation | May be absent | ✅ Added |
Overfitting Prevention | Not handled | ✅ Dropout used |
Evaluation | Accuracy only | ✅ Accuracy + Loss + Graphs |
Visualization | Minimal | ✅ Informative plots |
Procedure
Key Procedures
No | Step | Brief Description |
|---|---|---|
1 | Download Dataset | Use Kaggle API ( |
2 | Extract & Check Structure | Unzip the dataset and review folder vs metadata file structure |
3 | Remove Unused Files | Ignore |
4 | Split into Train & Validation | Create |
5 | Image Preprocessing | Resize images (e.g., 150x150), normalize (1./255), apply augmentation (optional) |
6 | Build CNN Model | Define a simple CNN: Conv2D → MaxPooling → Flatten → Dense → Dropout → Output |
7 | Compile Model | Use Adam optimizer, categorical crossentropy loss, and accuracy as metric |
8 | Train the Model | Train on the dataset for several epochs, monitor accuracy and loss |
9 | Evaluate Performance | Plot training/validation curves, generate confusion matrix and classification report |
Key Procedures
No | Step | Brief Description |
|---|---|---|
1 | Download Dataset | Use Kaggle API ( |
2 | Extract & Check Structure | Unzip the dataset and review folder vs metadata file structure |
3 | Remove Unused Files | Ignore |
4 | Split into Train & Validation | Create |
5 | Image Preprocessing | Resize images (e.g., 150x150), normalize (1./255), apply augmentation (optional) |
6 | Build CNN Model | Define a simple CNN: Conv2D → MaxPooling → Flatten → Dense → Dropout → Output |
7 | Compile Model | Use Adam optimizer, categorical crossentropy loss, and accuracy as metric |
8 | Train the Model | Train on the dataset for several epochs, monitor accuracy and loss |
9 | Evaluate Performance | Plot training/validation curves, generate confusion matrix and classification report |
Key Procedures
No | Step | Brief Description |
|---|---|---|
1 | Download Dataset | Use Kaggle API ( |
2 | Extract & Check Structure | Unzip the dataset and review folder vs metadata file structure |
3 | Remove Unused Files | Ignore |
4 | Split into Train & Validation | Create |
5 | Image Preprocessing | Resize images (e.g., 150x150), normalize (1./255), apply augmentation (optional) |
6 | Build CNN Model | Define a simple CNN: Conv2D → MaxPooling → Flatten → Dense → Dropout → Output |
7 | Compile Model | Use Adam optimizer, categorical crossentropy loss, and accuracy as metric |
8 | Train the Model | Train on the dataset for several epochs, monitor accuracy and loss |
9 | Evaluate Performance | Plot training/validation curves, generate confusion matrix and classification report |
Diagram Flow

Diagram Flow

Diagram Flow

Tools








Conclusion




Model Performance Summary
This project demonstrates a reliable and reproducible CNN-based model for multiclass classification of chest X-ray images into four categories: COVID-19, Normal, Lung Opacity, and Viral Pneumonia. By applying proper data cleaning, augmentation, and training monitoring, the model was able to generalize well without overfitting.
The model achieved a peak validation accuracy of 86.94% at epoch 9. Final results include:
Training accuracy: 81.6%
Validation accuracy: 82.4%
Loss: consistently decreasing, indicating stable learning
These results show that even a relatively simple CNN architecture can yield strong performance when supported by good data practices.
Model Performance Summary
This project demonstrates a reliable and reproducible CNN-based model for multiclass classification of chest X-ray images into four categories: COVID-19, Normal, Lung Opacity, and Viral Pneumonia. By applying proper data cleaning, augmentation, and training monitoring, the model was able to generalize well without overfitting.
The model achieved a peak validation accuracy of 86.94% at epoch 9. Final results include:
Training accuracy: 81.6%
Validation accuracy: 82.4%
Loss: consistently decreasing, indicating stable learning
These results show that even a relatively simple CNN architecture can yield strong performance when supported by good data practices.
Model Performance Summary
This project demonstrates a reliable and reproducible CNN-based model for multiclass classification of chest X-ray images into four categories: COVID-19, Normal, Lung Opacity, and Viral Pneumonia. By applying proper data cleaning, augmentation, and training monitoring, the model was able to generalize well without overfitting.
The model achieved a peak validation accuracy of 86.94% at epoch 9. Final results include:
Training accuracy: 81.6%
Validation accuracy: 82.4%
Loss: consistently decreasing, indicating stable learning
These results show that even a relatively simple CNN architecture can yield strong performance when supported by good data practices.
Future Work
To further improve this research, the following directions can be explored:
Transfer Learning: Integrate more powerful architectures like EfficientNet, ResNet, or DenseNet for better accuracy and feature extraction.
Model Interpretability: Use Grad-CAM or SHAP to visualize which parts of the lungs the model focuses on when making decisions.
Class Imbalance Handling: Apply techniques such as focal loss or class weighting to balance the learning process across underrepresented classes.
Deployment: Convert the model to TensorFlow Lite or ONNX for real-time inference in mobile or clinical environments.
Broader Dataset: Include CT scans or datasets from different sources to enhance robustness and reduce bias.
Future Work
To further improve this research, the following directions can be explored:
Transfer Learning: Integrate more powerful architectures like EfficientNet, ResNet, or DenseNet for better accuracy and feature extraction.
Model Interpretability: Use Grad-CAM or SHAP to visualize which parts of the lungs the model focuses on when making decisions.
Class Imbalance Handling: Apply techniques such as focal loss or class weighting to balance the learning process across underrepresented classes.
Deployment: Convert the model to TensorFlow Lite or ONNX for real-time inference in mobile or clinical environments.
Broader Dataset: Include CT scans or datasets from different sources to enhance robustness and reduce bias.
Future Work
To further improve this research, the following directions can be explored:
Transfer Learning: Integrate more powerful architectures like EfficientNet, ResNet, or DenseNet for better accuracy and feature extraction.
Model Interpretability: Use Grad-CAM or SHAP to visualize which parts of the lungs the model focuses on when making decisions.
Class Imbalance Handling: Apply techniques such as focal loss or class weighting to balance the learning process across underrepresented classes.
Deployment: Convert the model to TensorFlow Lite or ONNX for real-time inference in mobile or clinical environments.
Broader Dataset: Include CT scans or datasets from different sources to enhance robustness and reduce bias.
CATEGORY
COVID-19
COVID-19
COVID-19
Classification
Classification
Classification
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
Machine Learning
Machine Learning
Machine Learning
DURATION
Apr 2024



