Emotional Intelligence Challenge for LLMs in Long-Context Interaction
Emotional Intelligence Challenge for LLMs in Long-Context Interaction
Large language models (LLMs) make significant progress in Emotional Intelligence (EI) and long-context understanding. In order to more fully realize the potential of models, we organize the Emotional Intelligence Challenge for LLMs in Long-Context Interaction. This competition is conducted based on the LongEmotion benchmark and aims to rigorously evaluate emotional intelligence capabilities of models across multiple dimensions within extended textual contexts.
Organizing Institutions
Artificial Intelligence Research Institute
Shenzhen MSU-BIT University
Guangdong-Hong Kong-Macao Joint Laboratory
Emotional Intelligence and Pervasive Computing
Special Session Chair
Mohamed Jamal Deen
Important Dates
Registration Deadline
Complete registration before this date
Test Set Release
Competition test set will be available
Results Submission
Submit your competition results
Results Announcement
Winners announced at CloudCom 2025, Shenzhen, China
Awards & Prizes
Competition Tasks
Emotion Classification (EC)
This task requires the model to identify the emotional category of a target entity within long-context texts that contain lengthy spans of context-independent noise. Model performance is evaluated based on the consistency between the predicted label and the ground truth.
Emotion Detection (ED)
The model is given N+1 emotional segments. Among them, N segments express the same emotion, while one segment expresses a unique emotion. The model is required to identify the single distinctive emotional segment. During evaluation, the model’s score depends on whether the predicted index matches the ground-truth index.
Emotion QA (QA)
In this task, the model is required to answer questions grounded in long-context psychological literature. Model performance is evaluated using the F1 score between its responses and the ground truth answers.
Emotion Conversation (MC)
The model acts as a psychological counselor, offering empathetic and context-aware emotional support. Its responses are evaluated using stage-specific criteria based on professional counseling standards, with GPT-4o ensuring consistent and scalable scoring.
Emotion Summary (ES)
In this task, the model is required to summarize the following aspects from long-context psychological pathology reports: (1) causes, (2) symptoms, (3) treatment process, (4) illness characteristics, and (5) treatment effects. After generating the model’s response, we employ GPT-4o to evaluate its factual consistency, completeness, and clarity with respect to the reference answer.