Projects | Shiva Augusta UI/UX Portfolio

COVID-19 Chest X-ray Classification using CNNs

This is a Machine Learning project under the broader field of Data Science, aimed at building a Convolutional Neural Network (CNN) model to classify COVID-19 and non-COVID chest X-ray images. It demonstrates my ability to preprocess medical image data, develop and optimize deep learning architectures, and interpret model performance for real-world healthcare applications.

Visit Project

Organizer

Universitas Gadjah Mada

Role

Data Scientist / Machine Learning Engineer

Partner

Solo Project

Client

Academic Project or Internal Research

Intro

Understanding the Diagnostic Challenge

Due to the COVID-19 outbreak, chest X-rays have become vital for fast diagnosis. This project builds a CNN model to classify four types of lung conditions from X-ray images. The goal is to support early detection and differentiation of respiratory diseases efficiently.

Understanding the Diagnostic Challenge

Dataset

Source: COVID-19 Radiography Database

3 Classes: COVID-19, Normal, Viral Pneumonia
Size: 3616 COVID, 10,192 Normal, 1345 Viral Pneumonia images

Dataset

Source: COVID-19 Radiography Database

3 Classes: COVID-19, Normal, Viral Pneumonia
Size: 3616 COVID, 10,192 Normal, 1345 Viral Pneumonia images

Dataset

Source: COVID-19 Radiography Database

3 Classes: COVID-19, Normal, Viral Pneumonia
Size: 3616 COVID, 10,192 Normal, 1345 Viral Pneumonia images

Model Architecture (CNN)

Input (150x150x3)
↓
Conv2D (32 filters) + MaxPooling
↓
Conv2D (64 filters) + MaxPooling
↓
Flatten
↓
Dense (128) + ReLU
↓
Dropout (0.5)
↓
Dense (3) with Softmax

Model Architecture (CNN)

Input (150x150x3)
↓
Conv2D (32 filters) + MaxPooling
↓
Conv2D (64 filters) + MaxPooling
↓
Flatten
↓
Dense (128) + ReLU
↓
Dropout (0.5)
↓
Dense (3) with Softmax

Model Architecture (CNN)

Input (150x150x3)
↓
Conv2D (32 filters) + MaxPooling
↓
Conv2D (64 filters) + MaxPooling
↓
Flatten
↓
Dense (128) + ReLU
↓
Dropout (0.5)
↓
Dense (3) with Softmax

Improvements from Original Experiment

Aspect	Original	My Version
Dataset Handling	Raw zip used as-is	Cleaned & verified
Augmentation	May be absent	✅ Added
Overfitting Prevention	Not handled	✅ Dropout used
Evaluation	Accuracy only	✅ Accuracy + Loss + Graphs
Visualization	Minimal	✅ Informative plots

Improvements from Original Experiment

Aspect	Original	My Version
Dataset Handling	Raw zip used as-is	Cleaned & verified
Augmentation	May be absent	✅ Added
Overfitting Prevention	Not handled	✅ Dropout used
Evaluation	Accuracy only	✅ Accuracy + Loss + Graphs
Visualization	Minimal	✅ Informative plots

Improvements from Original Experiment

Aspect	Original	My Version
Dataset Handling	Raw zip used as-is	Cleaned & verified
Augmentation	May be absent	✅ Added
Overfitting Prevention	Not handled	✅ Dropout used
Evaluation	Accuracy only	✅ Accuracy + Loss + Graphs
Visualization	Minimal	✅ Informative plots

Procedure

Key Procedures

No	Step	Brief Description
1	Download Dataset	Use Kaggle API (`kaggle.json`) to download the COVID-19 Radiography dataset
2	Extract & Check Structure	Unzip the dataset and review folder vs metadata file structure
3	Remove Unused Files	Ignore `.xlsx` metadata files, keep only image folders (`COVID`, `Normal`, etc.)
4	Split into Train & Validation	Create `train/` and `val/` folders, split images randomly (e.g., 80:20 ratio)
5	Image Preprocessing	Resize images (e.g., 150x150), normalize (1./255), apply augmentation (optional)
6	Build CNN Model	Define a simple CNN: Conv2D → MaxPooling → Flatten → Dense → Dropout → Output
7	Compile Model	Use Adam optimizer, categorical crossentropy loss, and accuracy as metric
8	Train the Model	Train on the dataset for several epochs, monitor accuracy and loss
9	Evaluate Performance	Plot training/validation curves, generate confusion matrix and classification report

Key Procedures

No	Step	Brief Description
1	Download Dataset	Use Kaggle API (`kaggle.json`) to download the COVID-19 Radiography dataset
2	Extract & Check Structure	Unzip the dataset and review folder vs metadata file structure
3	Remove Unused Files	Ignore `.xlsx` metadata files, keep only image folders (`COVID`, `Normal`, etc.)
4	Split into Train & Validation	Create `train/` and `val/` folders, split images randomly (e.g., 80:20 ratio)
5	Image Preprocessing	Resize images (e.g., 150x150), normalize (1./255), apply augmentation (optional)
6	Build CNN Model	Define a simple CNN: Conv2D → MaxPooling → Flatten → Dense → Dropout → Output
7	Compile Model	Use Adam optimizer, categorical crossentropy loss, and accuracy as metric
8	Train the Model	Train on the dataset for several epochs, monitor accuracy and loss
9	Evaluate Performance	Plot training/validation curves, generate confusion matrix and classification report

Key Procedures

No	Step	Brief Description
1	Download Dataset	Use Kaggle API (`kaggle.json`) to download the COVID-19 Radiography dataset
2	Extract & Check Structure	Unzip the dataset and review folder vs metadata file structure
3	Remove Unused Files	Ignore `.xlsx` metadata files, keep only image folders (`COVID`, `Normal`, etc.)
4	Split into Train & Validation	Create `train/` and `val/` folders, split images randomly (e.g., 80:20 ratio)
5	Image Preprocessing	Resize images (e.g., 150x150), normalize (1./255), apply augmentation (optional)
6	Build CNN Model	Define a simple CNN: Conv2D → MaxPooling → Flatten → Dense → Dropout → Output
7	Compile Model	Use Adam optimizer, categorical crossentropy loss, and accuracy as metric
8	Train the Model	Train on the dataset for several epochs, monitor accuracy and loss
9	Evaluate Performance	Plot training/validation curves, generate confusion matrix and classification report

Diagram Flow

Tools

Conclusion

Model Performance Summary

This project demonstrates a reliable and reproducible CNN-based model for multiclass classification of chest X-ray images into four categories: COVID-19, Normal, Lung Opacity, and Viral Pneumonia. By applying proper data cleaning, augmentation, and training monitoring, the model was able to generalize well without overfitting.

The model achieved a peak validation accuracy of 86.94% at epoch 9. Final results include:

Training accuracy: 81.6%
Validation accuracy: 82.4%
Loss: consistently decreasing, indicating stable learning

These results show that even a relatively simple CNN architecture can yield strong performance when supported by good data practices.

Model Performance Summary

The model achieved a peak validation accuracy of 86.94% at epoch 9. Final results include:

Training accuracy: 81.6%
Validation accuracy: 82.4%
Loss: consistently decreasing, indicating stable learning

These results show that even a relatively simple CNN architecture can yield strong performance when supported by good data practices.

Model Performance Summary

The model achieved a peak validation accuracy of 86.94% at epoch 9. Final results include:

Training accuracy: 81.6%
Validation accuracy: 82.4%
Loss: consistently decreasing, indicating stable learning

These results show that even a relatively simple CNN architecture can yield strong performance when supported by good data practices.

Future Work

To further improve this research, the following directions can be explored:

Transfer Learning: Integrate more powerful architectures like EfficientNet, ResNet, or DenseNet for better accuracy and feature extraction.
Model Interpretability: Use Grad-CAM or SHAP to visualize which parts of the lungs the model focuses on when making decisions.
Class Imbalance Handling: Apply techniques such as focal loss or class weighting to balance the learning process across underrepresented classes.
Deployment: Convert the model to TensorFlow Lite or ONNX for real-time inference in mobile or clinical environments.
Broader Dataset: Include CT scans or datasets from different sources to enhance robustness and reduce bias.

Future Work

To further improve this research, the following directions can be explored:

Transfer Learning: Integrate more powerful architectures like EfficientNet, ResNet, or DenseNet for better accuracy and feature extraction.
Model Interpretability: Use Grad-CAM or SHAP to visualize which parts of the lungs the model focuses on when making decisions.
Class Imbalance Handling: Apply techniques such as focal loss or class weighting to balance the learning process across underrepresented classes.
Deployment: Convert the model to TensorFlow Lite or ONNX for real-time inference in mobile or clinical environments.
Broader Dataset: Include CT scans or datasets from different sources to enhance robustness and reduce bias.

Future Work

To further improve this research, the following directions can be explored:

Transfer Learning: Integrate more powerful architectures like EfficientNet, ResNet, or DenseNet for better accuracy and feature extraction.
Model Interpretability: Use Grad-CAM or SHAP to visualize which parts of the lungs the model focuses on when making decisions.
Class Imbalance Handling: Apply techniques such as focal loss or class weighting to balance the learning process across underrepresented classes.
Deployment: Convert the model to TensorFlow Lite or ONNX for real-time inference in mobile or clinical environments.
Broader Dataset: Include CT scans or datasets from different sources to enhance robustness and reduce bias.