Mental_Health_Diagnosis_Voice_with_ViT-CNN

Mental Health Diagnosis From Voice Data Using Convolutional Neural Networks and Vision Transformers

This repository contains the implementation of the hybrid ViT-CNN model developed for diagnosing mental stability using voice data. The research paper associated with this work has been published in the Journal of Voice.


📄 Research Paper

Title: Mental Health Diagnosis From Voice Data Using Convolutional Neural Networks and Vision Transformers
Authors: Rafiul Islam, Dr. Md. Taimur Ahad, et al.
Journal: Journal of Voice (Q1 Journal)
Access the Paper Here


📊 Project Overview

Mental health diagnostics are traditionally subjective, relying on self-reports and clinician observations. This study proposes a voice-based diagnostic approach using a hybrid Vision Transformer (ViT) and Convolutional Neural Network (CNN) model to classify mental stability.

Key Highlights:


📋 Requirements


🛠 How to Run

  1. Install Dependencies:
    • pip install librosa matplotlib numpy seaborn tensorflow keras pandas scikit-learn tqdm imblearn
  2. Prepare Dataset:
    • Download the dataset from (https://data.mendeley.com/datasets/s5j25b5tjk/1)
    • Place stable and unstable audio files in data/mentally_stable/ and data/mentally_unstable/ folders.
  3. Run Notebook:
    • Open and execute notebooks/Updated_ViT_CNN_Model.ipynb.

📊 Results

Accuracy: 91%

Key Visualizations:


📜 Citation

If you use this work, please cite our paper:


🧑‍💻 Author


Let me know if you’d like to proceed with implementing this structure, or if you want specific adjustments to the README.md. I can also help with creating scripts or setting up the repository locally!