A survey on the use of vision transformer and custom-built CNN for classifying ultrasound images of breast tissue
Abstract
Recently, the scientific community has focused on developing convolutional neural network (CNN) algorithms to enhance the ability of medical tools to diagnose breast cancer. This study aims to evaluate a powerful new deep learning technique based on Vision Transformers (ViT), which involves pre-processing images by dividing them into patches, and a custom-built CNN. The performance of both CNNs were tested using the following class combinations: healthy/benign, healthy/malignant, benign/ malignant, and healthy/benign/malignant. The images utilized in the classification process were sourced from the BUS-BRA dataset available on Kaggle. We observed that ViT demonstrated improved performance when the benign class was included in the classification. Although the obtained accuracy is modest, it is noteworthy that in this relatively unexplored field, the classification accuracy between benign and malignant classes was 78%, and 75% for the benign class and healthy patients when the ViT was used.