A survey on the use of vision transformer and custom-built CNN for classifying ultrasound images of breast tissue

Mihai Grecu; Simona Moldovanu

doi:10.35219/ann-ugal-math-phys-mec.2025.2.05

Mihai Grecu "Sf. Andrei" County Emergency Clinical Hospital, Galați 177 Brăilei Street, 800578, Faculty of Medicine and Pharmacy, “Dunărea de Jos”University in Galați, 800216 Galați, Romania
Simona Moldovanu Departament of Computer Science and Information Technology, Faculty of Automation, Computers, Electrical Engineering and Electronics, “Dunarea de Jos” University of Galati, 47 Domneasca Str., 800008 Galati, Romania*, The Modelling & Simulation Laboratory, Dunarea de Jos University of Galati, 47 Domneasca Street, 800008 Galati, Romania

DOI: https://doi.org/10.35219/ann-ugal-math-phys-mec.2025.2.05

Keywords: ultrasound images, Vision Transformers, BUS-BRA Dataset

Abstract

Recently, the scientific community has focused on developing convolutional neural network (CNN) algorithms to enhance the ability of medical tools to diagnose breast cancer. This study aims to evaluate a powerful new deep learning technique based on Vision Transformers (ViT), which involves pre-processing images by dividing them into patches, and a custom-built CNN. The performance of both CNNs were tested using the following class combinations: healthy/benign, healthy/malignant, benign/ malignant, and healthy/benign/malignant. The images utilized in the classification process were sourced from the BUS-BRA dataset available on Kaggle. We observed that ViT demonstrated improved performance when the benign class was included in the classification. Although the obtained accuracy is modest, it is noteworthy that in this relatively unexplored field, the classification accuracy between benign and malignant classes was 78%, and 75% for the benign class and healthy patients when the ViT was used.

Downloads

Download data is not yet available.