1. Introduction
Emotion Recognition is an important area of research to enable effective human-computer interaction [
1]. Scientific research has led to applications of emotion recognition in tasks such as examining the mental health of patients [
2], safe driving of vehicles [
3], and ensuring social security in public places [
4]. Collective emotions of a team refer to the shared emotional experiences and states that emerge within a group of individuals working together towards a common goal. These emotions are not just the sum of individual emotions but are experienced and felt collectively by the team as a whole. Collective behavior and group dynamics identify the synchronous convergence of an effective response across a group of individuals using data [
5]. This multimodal data may consist of facial configurations, textual sentiments [
6], voice, granular data amassed from wearable devices [
7], or even neurological data obtained from brain-computer interfaces [
8]. Analyzing collective behavior aims to understand the emergent properties that arise from the interactions of individuals within a group [
9]. These emergent properties may include collective intelligence, decision-making processes, flow, coordination, or conflict resolution [
9]. The detection and analysis of emotions leveraging recent developments in artificial intelligence have seen progressive advancements using multimodal datasets, machine learning, and state-of-the-art deep learning models. Understanding collective behavior and group dynamics is crucial for improving team performance [
9]. By identifying factors that facilitate or hinder effective responses across the group, interventions can be developed to enhance collaboration, decision-making, and overall group performance [
9].
Facial Emotion Recognition (FER) is a computer vision task aimed at identifying and categorizing emotional expressions depicted on a human face [
10]. The goal is to automate the process of determining emotions in real-time, by analyzing the various features of a face such as eyebrows, eyes, and mouth, and mapping them to a set of basic emotions like anger, fear, surprise, disgust, sadness, and happiness [
10]. Recently, researchers have turned to explore emotions through body pose and posture, emotional body language, and motion [
11]. The recent improvements in human pose estimation make pose-based recognition feasible and attractive [
11]. Several recent studies propose further improvements in conducting body language prediction from RGB videos with poses calculated by OpenPose [
12]. This study is also related to emotion recognition. Several previous studies propose to detect psychological stresses with multi-modal contents and recognize effects with body movements [
13]. Unconsciously shared movement patterns can reveal interpersonal relationships: from the similarity of their poses, reciprocal attitudes of individuals can be deduced [
13]. Estimation of body synchronization is relevant in a variety of fields like synchronized swimming [
14], diving [
15] and group dancing [
16] which can profit from an analysis of motion and pose similarity. Organizational researchers focusing on leadership and team collaboration may be interested in studying human interactions through synchronization effects [
17]. Psychological and sociological research is studying similar effects, too. For example, exploring the effect of body synchronization on social bonding and social interaction [
18,
19]. Interest in body synchronization stems from the objective to transfer inter-personal entanglement, a social network metric describing the relationship of individuals in their community, to human body movement. Bodily entanglement is defined as an overarching concept entailing the synchronization of bodies and their distance [
20]. Entanglement as a social network metric has been proven to be an indicator of team performance, employee turnover, individual performance, and customer satisfaction [
20]. The concept is based on earlier research that studied various forms of human synchronization, emotional body language, and activities that lead to a state of connection and flow between individuals [
20].
Research on the estimation of body synchronization in a group of jazz musicians focuses on understanding how musicians coordinate their movements and actions during a performance. At the same time, we also look at how this is related to the overall flow, entanglement, and collective emotional behavior of the group. Glowinski, D.
et al. [
21] explore the automatic classification of emotional body movements in music performances using machine learning. Their study aims to develop computational models that can recognize and classify the emotions expressed through body movements. Participants performed musical tasks while their movements were analyzed and used to train machine learning algorithms. The results demonstrate the potential of automated systems to recognize affective body movements in music, with applications in affective computing and human-computer interaction. However, research on predicting the collective emotions of teams performing music using a quantified metric for body synchronization is lacking due to the limited availability of reliable tools for multi-person pose synchronization [
22]. Existing tools are error-prone and tailored for specific purposes, hindering comprehensive studies [
22]. Furthermore, there is a lack of integrated research, both technically and conceptually, examining the intricate bodily entanglement and flow of performing groups, such as jazz orchestras, to identify factors influencing group performance. Motivated by the aforementioned research problems, we inspect ways of examining the relationship between team entanglement and the collective emotions of a group of Jazz musicians.
We study the data from a two-hour jazz rehearsal session performed by an orchestra of 19 musicians who were part of the Jazzaar festival (
www.jazzaar.com).
Figure 1 depicts musicians playing diverse instruments during the Jazzaar experiment. The chief contributions of the research presented in this paper are:
We developed a high-performing system for real-time estimation of multi-person pose synchronization, detecting body synchronization across diverse visual inputs to calculate synchronization metrics. It leverages Lightweight OpenPose for efficient pose estimation, achieving a performance of 5-6 frames per second on a regular CPU. By analyzing pre-recorded rehearsal videos of jazz musicians, we extract 17 body synchronization metrics, encompassing arm, leg, and head movements. These metrics serve as features for our deep learning model. The system incorporates a robust synchronization metric, enabling accurate detection across various pose orientations.
To assess the relationship between facial emotions and team entanglement, we compute the Pearson correlation between facial emotions and various body synchrony scores. Additionally, we conduct a regression analysis over the time series data, using body synchrony scores as predictors and facial emotions as dependent variables. This approach allows us to estimate the impact of body synchrony on facial emotions, providing deeper insights into the connection between team dynamics and emotional expressions.
We propose a machine learning pipeline to predict the collective emotions of Jazz musicians using body synchrony scores to achieve accurate and interpretable results.
5. Discussion
We refer to
Figure 9 for interpreting the results of our correlation analysis. For the collective emotion of
disgust, the r values of
r_knee_to_r_ank (-0.51),
l_hip_to_l_knee (-0.39),
r_hip_to_r_knee (-0.35) and
l_knee_to_l_ank (-0.35) are particularly highly negative, indicating a negative correlation between the leg movements of jazz musicians and the collective emotion of
disgust. We also observe that the
neck_to_nose (-0.43) and
l_eye_to_l_ear (-0.39) values are negatively correlated with the
disgust emotion. The movement we are looking at over here is the shaking of head. Apart from the movement of the legs and head, we also look at the hand movements. The r values of
r_elb_to_r_wri (-0.41),
l_elb_to_l_wri (-0.41),
r_sho_to_r_elb (-0.36) and
l_sho_to_l_elb (-0.41) reveal that the hand movements of the jazz musicians are also highly negatively correlated with the emotion of
disgust. All the above correlations have a significance value (p) of less than 0.0001. We corroborate that the musicians engaging in Jazz have reduced levels of disgust and strong feelings of liking and enjoyment and have the potential to foster a state of synchronicity and flow, wherein they individually and collectively experience a harmonious alignment in their thoughts, actions, and emotions. Correlation of the
surprise emotion is also observed to be negative having values of
r_elb_to_r_wri (-0.44),
r_knee_to_r_ank (-0.41),
neck_to_nose(-0.37),
l_sho_to_r_elb(-0.36) and
l_elb_to_l_wri(-0.36) with a significance value(p) of less than 0.0001. This can be understood as an implication where being less surprised, having anticipation, prior instrument practice, and being well-rehearsed can be directly connected to being in a state of flow and synchronization. The musicians are more likely to be entangled when they collectively practice more to attain perfect synchronization. Other correlations with emotions of
sadness,
anger and
fear also prove to be negative implying that synchronization in head, arm, and leg movements among musicians indicate strong team entanglement and a state of flow with the musicians feeling more joyous.