This repository contains the implementation of the hybrid ViT-CNN model developed for diagnosing mental stability using voice data. The research paper associated with this work has been published in the Journal of Voice.
📄 Research Paper
Title: Mental Health Diagnosis From Voice Data Using Convolutional Neural Networks and Vision Transformers
Authors: Rafiul Islam, Dr. Md. Taimur Ahad, et al.
Journal: Journal of Voice (Q1 Journal)
Access the Paper Here
📊 Project Overview
Mental health diagnostics are traditionally subjective, relying on self-reports and clinician observations. This study proposes a voice-based diagnostic approach using a hybrid Vision Transformer (ViT) and Convolutional Neural Network (CNN) model to classify mental stability.
Key Highlights:
- Feature Extraction: Log-mel spectrograms generated from audio signals.
- Model Architecture: Combines CNNs for local feature extraction and Vision Transformers for capturing long-range dependencies.
- Performance: Achieved 91% classification accuracy.
- Dataset: 85 recordings collected ethically from Bangladeshi participants.
📋 Requirements
- Python 3.8 or above
- TensorFlow, Keras, Librosa, Matplotlib, NumPy, Scikit-learn, SMOTE
🛠 How to Run
- Install Dependencies:
- pip install librosa matplotlib numpy seaborn tensorflow keras pandas scikit-learn tqdm imblearn
- Prepare Dataset:
- Download the dataset from (https://data.mendeley.com/datasets/s5j25b5tjk/1)
- Place stable and unstable audio files in data/mentally_stable/ and data/mentally_unstable/ folders.
- Run Notebook:
- Open and execute notebooks/Updated_ViT_CNN_Model.ipynb.
📊 Results
Accuracy: 91%
Key Visualizations:
📜 Citation
If you use this work, please cite our paper:
- Islam, R., Ahad, M. T., Ahmed, F., Song, B., & Li, Y. (2024). Mental health diagnosis from voice data using convolutional neural networks and vision transformers. Journal of Voice. DOI:10.1016/j.jvoice.2024.10.010
🧑💻 Author
- Rafiul Islam
- Researcher at Daffodil International University
Let me know if you’d like to proceed with implementing this structure, or if you want specific adjustments to the README.md. I can also help with creating scripts or setting up the repository locally!