PDMR 2025: Privacy-Preserved Multimodal Depression Recognition Challenge
PDMR 2025: Privacy-Preserved Multimodal Depression Recognition Challenge
Depression has become a major global public health concern, underscoring the urgent need for effective and reliable automatic recognition methods. However, in real-world scenarios, patients are often reluctant to disclose sensitive personal information, making privacy-preserving depression recognition particularly important.
The Privacy-Preserved Depression Recognition Challenge (PDMR 2025) invites researchers and developers worldwide to advance multimodal learning techniques for depression recognition. Participants will tackle the task using audio and video data. To protect privacy, facial regions in the video recordings are deliberately masked, requiring teams to move beyond traditional facial expression analysis and instead explore innovative strategies that leverage voice characteristics, body movements, and temporal dynamics.
This competition emphasizes challenges such as privacy protection, missing modalities, cross-modal fusion, and robustness, encouraging participants to propose solutions that are not only technically effective but also ethically sound. Ultimately, PDMR 2025 aims to foster the development of depression recognition systems that are more generalizable, interpretable, and practically applicable.
Organizing Institutions
Artificial Intelligence Research Institute
Shenzhen MSU-BIT University
Guangdong-Hong Kong-Macao Joint Laboratory
Emotional Intelligence and Pervasive Computing
Important Dates
Registration Deadline
Complete registration before this date
Dataset Release
Competition dataset will be available
Results Submission
Submit your competition results
Results Announcement
Winners announced at CloudCom 2025, Shenzhen, China
Awards & Prizes
Dataset Information
Video Data
The raw video data is processed using OpenFace to extract N-dimensional features. The data is stored in .csv files with a shape of [T, N], where:
- T denotes the number of frames
- N denotes the number of features
Audio Data
The raw audio is first processed with log-Mel filters to extract features of shape [T, 48], and then compressed into a fixed-length vector of [1, 768] using NetVLAD with 16 cluster centers. The data is stored in .csv files with a shape of [1, 768].
Data Structure
train/
├── video/
│ ├── 001.csv
│ ├── 002.csv
│ └── ...
└── audio/
├── 001.csv
├── 002.csv
└── ...
test/
├── video/
└── audio/
Competition Task
Task Overview
The main purpose of this classification task is to use artificial intelligence algorithms to complete the feature classification of normal people and depression patients based on the multimodal data provided.
Input
Multimodal data fragment of subjects (.csv file containing 128*500 matrix)
Output
The results of the classification of the depressive disorder of the subjects to which this data segment belongs
Output Format
{
{
"data_id": "001",
"status": "Depression"
},
{
"data_id": "002",
"status": "Normal"
},
{
"data_id": "003",
"status": "Normal"
},
{
"data_id": "004",
"status": "Depression"
}
}
Evaluation
Participants are required to complete all the tasks and submit the test result JSONL files for each task. Submissions will be evaluated based on their F1 score.
The F1 score measures accuracy using the precision and recall:
- Precision is the ratio of true positives (TP) to all predicted positives (TP + FP)
- Recall is the ratio of true positives to all actual positives (TP + FN)