Most relevant research studies in the field of reinforcement learning applied to collaborative robotics will be highlighted, with a permanent focus on the human factors and brain activity signals. Solely relevant research in the intersection of the aforementioned subjects will be analyzed during this section.
3.1. EEG based Brain-Computer Interface Approaches in Collaborative Robot Control
During this sub-section it is intended to analyze different research and technological advances in the area of brain signal capture through EEG applied to collaborative robotics. The purpose is to address the possibility of improving the understanding and execution of robotic systems to make them more intuitive and adaptive to a human being.
One of the most recent studies in the field of collaborative robotics, combining brain activity measurement through EEG and machine learning, is the research published by the University of Pennsylvania in 2022. The paper discusses the use of EEG signals to measure peoples’ trust levels in collaborative construction robots. EEG signals provide valuable information about human brain activity and cognitive states, including trust, during human-robot collaboration [
29]. EEG signals are also used for determining mental states, electroencephalography sensors are able to measure brainwave frequencies such as delta, theta, alpha, beta, and gamma waves. Theta waves (4–8 Hz) appear during REM sleep, deep meditation, or flow states, whereas delta waves (0.5–4 Hz) signify dreamless sleep. Beta waves (12-38 Hz) include low beta (12-15 Hz) for idle states, beta (15-23 Hz) for attention, and high beta (23-38 Hz) for stress and difficult activities. Alpha waves (8-12 Hz) are associated with relaxed concentration. Moreover, gamma waves (25–45 Hz) provide important insights related to intentional activities, multitasking, and creativity [
29]. (See
Figure 7)
However, EEG signals can be contaminated by other frequency signals, both intrinsic and extrinsic, resulting in a reduction in the original trust signal quality. In order to address this, some studies used a fixed-gain filtering method to reduce extrinsic components and utilized independent component analysis (ICA) to remove intrinsic components from EEG signals. Once the filtering was done, the results in the EEG measurements were significantly cleaner [
29,
31].
After the reduction was done, 12 trust related characteristics from the EEG signals that span the temporal and frequency domains were extracted. These features were calculated from segmented EEG data, and this information was then used for machine learning model training.
In order to evaluate the amount of trust levels in collaborative robots several supervised learning algorithms [
32] are used, including k-nearest neighbors (k-NN), support vector machine (SVM) [
33], and random forest. Among the supervised learning algorithms used, the k-NN outperforms the others showing the highest accuracy at approximately 88%.
Once machine learning algorithms are applied, a test with human participants involving building tasks can be conducted to determine trust levels in different robot collaboration scenarios. The results show that higher levels of trust are achieved while working with semi-autonomous robots. However, working with an autonomous robots lead to lower levels of trust due to the sense of not having any control on the robot. These research findings remark the potential of EEG-based trust measurement in a human-robot collaboration [
29,
31].
The conclusion is that using EEG brainwaves it is possible to determine a person’s trust in a robot while it completes a task. It is important to highlight that these experiments are carried out in a controlled virtual reality environment. It is also relevant to note that even if no reinforcement learning strategies are used, it is still significant to bring attention to the potential use of these kinds of signals in reinforcement learning training for collaborative robot environments [
34].
Reinforcement learning in the contexts of collaborative robots and brain signals was not specifically addressed. However, a very recent study also explores the use of reinforcement learning to enhance human-robot collaboration in assembly tasks, focusing on dynamic task allocation, effectively balancing the workload between humans and robots [
35]. Building upon this foundation, research in the field continued to evolve modeling discomfort in human-robot collaboration and making the robot meet individual preferences [
36]. However, a different investigation secured its status as an innovator in the discipline, constructing the initial steps for the assimilation of EEG signals into reinforcement learning and robotics. [
37].
Expanding further on its findings, the study [
37]explores the application of reinforcement learning algorithms in robotics, specifically in the context of robots learning to solve tasks based on reward signals obtained during task execution. In many other research, these reward signals are either modeled by programmers or provided through human supervision. However, there are situations where encoding these rewards can be challenging, resulting in the suggestion of using EEG-based brain activity as reward signals.
The core idea of this proposed article is to extract rewards from brain activity while observing a robot performing a task, this eliminates the need for an explicit reward model. The paper introduces a new idea for using brain activity signals through EEG sensors to provide correctness feedback to a collaborative robot about an specific task. This demonstrates the ability to identify and classify different error levels based on the brain signals [
37].
Brain-computer interfaces (BCI) [
38] in robotics have been identified as a hot topic [
14]. EEG is highlighted as the recording method of choice due to its portability and high temporal resolution. The research also remarks the use of event-related potentials (ERPs) [
39] in error detection and shows how these ERPs may be automatically categorized using machine learning and signal processing methods.
The study also offers a reinforcement learning framework for learning tasks based on reward signals obtained from monitored brain activity. Q-learning [
40] is a reinforcement learning method that uses a Q-function to optimize sequential decision-making and was the algorithm of choice to demonstrate learning in real-time tasks in collaborative robotics scenarios. The results of the research suggest that EEG-based reward signals hold great potential in robot learning and task adaptation. However, the study was carried out in 2010, which is why several lines of improvement for future research are presented in the article.
Following the line of research presented where robots learn to adapt their behavior based on error signals generated by brain waves measured by EEG sensors, there is an investigation carried out in 2021 [
41] that improves the previous one by proposing an approach in which a robot arm is trained to play a game and then uses the learning to teach different children. The training process involves automatic detection of rewards and penalties based on EEG signals, probability-based action planning, and imitation of human actions for training children.
In this research a specific reinforcement learning scheme is presented. In the case of the research carried out to teach the robot, the planning is not done as in traditional RL. Normally in reinforcement learning, actions are planned based on a partial learning of the environment, which means that agents make decisions based on the partial knowledge they have acquired so far in the environment. In the case of the proposed research, action planning takes place after the RL algorithm has converged (convergence happens when the RL algorithm has reached a state of knowledge where it has learned enough about the environment and the actions).This approach can be very beneficial in situations where fast and accurate decisions are required, one of those could be what the article is describing: the use of RL for training a robot to play a specific game [
41].
Regarding the learning approach based on error signals and continuing with the review proposed above, it is necessary to highlight that the error potential-related events (ErrP) [
42] signals represent the subjective errors when a subject observes an error either in a robot or even in itself [
43]. In the proposed learning case, if no error is detected, a small positive reward will be given, however, if an error is detected, a negative reward will be applied. The rewards will be used to update a table of probabilities that consider states and actions. Once the entire learning phase is completed, the agent will have acquired a behavior based on the probabilities in that table.
In the training phase, the objective is to update the State-Action Probability Matrix (SAPM) to optimize actions for given states. This requires error signal detection and management using classifiers [
44]. Unlike traditional BCI systems, training occurs both online and offline. On one hand, offline training involves subjects performing sessions to gather data, with a portion used to train classifiers, using around 12,000 instances of brain signals. On the other hand, online training then adapts the SAPM using reinforcement learning. After training, the agent’s behavior, particularly the robot arm’s action planning, is tested. This process involves data acquisition, offline classifier training, and SAPM adaptation[
41].
The study conducts a two-stage training with the Jaco robot arm. The first phase is offline, using EEG data for classifier training with 18,000 instances, including ERD/ERS and ErrP signals. The second phase is online, adapting the SAPM with visual and audio stimuli for learning and correction. The test phase compares the performance of children trained by the robot to those trained by humans [
41].
In general, the study is quite innovative since it allows detecting when a user performs an experiment in the wrong way. Then the robot’s behavior will be modified in order to teach the user to perform the experiment correctly. One of the aspects to highlight in this case is that the behavior of the robot is not directly influenced by the EEG signals, rather it will choose an action to be taken when the agent determines, either to replicate one of the movements or to throw the ball again. One of the future lines to point out could be the training of the agent in the task of throwing the ball and then, depending on the degree of error from the user, modify its behavior in a more direct way [
41].
Although there are points that could be improved in the previous article, there are other very notable studies that aim to modify the total behavior of a robot based on the measured EEG signals. In addition, the authors only explored the possibility of detecting errors in certain tasks to carry out training of different agents. However, the possibility of detecting different feelings and emotions is something that can be distinctive to modify the behavior of a robot [
45].
Other interesting approach was published in 2021 under the name “Emotion-Driven Analysis and Control of Human-Robot Interactions in Collaborative Applications” [
29]. The authors focus on the behavior of a robot and the ability of the robot to adapt to different situations depending on the brain signals that are received from an EEG sensor. The research is based on the application of fuzzy logic rules to modify certain critical variables of the robot motion such as speed or motion delay. The rules are created from the beginning by focusing on stress. Nevertheless, a trial-and-error process will be carried out to establish representative relationships between the robot’s speed and the user’s emotions.
The interesting thing about this study is that it is possible to modify the robot’s behavior in a relatively simple way. While it is true that only motion-related variables are modified through the applied rules, the robot can modify its behavior based on measured EEG signals. The experimentation process including the EEG sensor, the collaborative robot and the human can me appreciated in
Figure 8. It is also noteworthy that no reinforcement learning is used during this research, the fact that it is possible to use brain signals and relate them directly to emotions such as stress, anxiety, and depression, is a very important aspect of this work that could lead to the use of these signals in reinforcement learning investigations.
Once it is established the possibility to modify the behavior of a robot based on the emotions that the user is feeling, new paradigms and new research areas appear. In a study carried out in June 2022 [
45], it is confirmed that it is possible to modify the way a cobot behaves based on the feelings of a human being. The intention is to be able to modify the behavior of the robot to achieve a level of empathy, in this case, the robot acts in a very similar way to the emotions felt by the human. The experiment is carried out in a simple and controlled environment, but it is very useful to demonstrate that it is indeed possible to modify the behavior of a robot based on the brain measurements of a human.
After looking at the most recent studies, brainwave measurement is a reliable method for modifying a robot’s behavior. However, reinforcement learning is not an area where this approach is routinely applied, even though it allows for emergent behaviors and greater adaptability. Although it is true that most of the studies that apply RL focus only on the detection of errors in executions to reward or penalize an agent, it could be very useful to use similar techniques to modify the behavior of a collaborative robot depending on the user’s emotions.
During this review article, the filtering of brain signals has been covered on several occasions. However, it is important to note that other relevant studies propose different techniques to filter the signals and achieve the desired emotion or potential error detection. The use of CNN is proposed as a valid method to filter and classify the EEG signals. However, as the classification proposed was done in binary terms considering only if the experiment was done correctly or incorrectly; it’s difficult to relate if this could be useful for more complex experiments where different emotions need to be taken into account [
46].
The filtering of EEG signals and their subsequent classification is one of the biggest challenges in this technology [
37]. In addition to that, everything related to brain signals is a relatively recent topic that has emerged during the last few years. That is why several techniques have been used in the quantification of signals that can enable Human-Robot Interaction.
In this section it became clear that the brain signals measured by EEG sensors are perfectly valid to carry out different investigations around collaborative robotics and reinforcement learning. In the following section, other relevant techniques related to the capture of different valuable signals for Human-Robot Interaction will be discussed.
3.2. Additional Human State Measuring Techniques for Collaborative Robotics
In this subsection several techniques related to the measurement of human state (apart from EEG) will be described in order to apply them to a human-robot environment. As a general overview,
Figure 9 provides a detailed diagram of the human body, indicating the placement of various bio-sensors. Some bio-sensor devices, as highlighted in [
47], will be explored to provide an overview of their various applications within collaborative robotics and reinforcement learning.
One of the techniques that has been useful in Human-Robot Interaction is eye tracking. An eye tracker is a device that records and follows a person’s eye movements during Human-Robot Interaction, allowing them to understand how a human focuses attention and responds to visual stimuli. Numerous studies have revealed that the eyes can play a significant role in communication and anticipation of the robot’s actions [
48,
49].
The eye tracker has been used in different investigations to determine actions of a collaborative robot and also in stress detection. The idea behind using an eye tracker is the ease of reading the signals with the right device. The only drawback of this type of technology is the delay between reading and interpreting the signal, which can mean that the human is at a completely different state. Stress detection is one of the most interesting areas behind eye trackers, as it could allow the training of a RL agent based on the investigations discussed in the previous paragraphs. For the detection of this type of stress-related variables it is important to take into account pupil diameter and number of gaze fixations [
50].
Another technology that allows the detection of different emotions and movements is computer vision. Computer vision refers to the application of different algorithms in combination with image and video processing techniques to identify gestures, postures and facial expressions in real time. The aim with this kind of techniques is to understand and analyze human behaviors and associated emotions [
51].
One of the most interesting disciplines in computer vision is human activity recognition (HAR). These types of disciplines have wide applications in fields such as human-machine interaction, but also in robotics and video games, where HAR improves understanding of human intentions and emotions [
51,
52].
Thanks to human activity recognition, computer vision may have a place in robotics and reinforcement learning applications as shown in an interesting recent study [
53]. However, computer vision research, normally targets the use of computer vision in anomaly detection, with insufficient examples on its integration with reinforcement learning. Anomaly detection implies using computer vision techniques to identify unusual behaviors and events in videos or images, this plays a significant role in security surveillance systems [
52,
54].
Although computer vision is a widely used research technique, it has many drawbacks. Reliability in variable lighting conditions or blurred images are usually a problem, as these systems often fail in harsh conditions. In addition, interpreting complex images or detecting objects in unusual situations still represents a significant challenge for computer vision. Finally, there is a risk of bias and discrimination in computer vision systems when training datasets are not properly represented [
51].
Either eye trackers or adapted cameras for computer vision can be useful to detect different human factors related variables, however, they depend on many aspects such as light to function properly. On the other hand, another useful technique that was first used years ago to measure stress and fatigue was the cardiac rhythm. A cardiological study [
55] assures that there is a direct correlation between arrhythmias and tachycardias with the level of stress and fatigue they are feeling. Correct measurement with the right sensors can determine the level of stress and therefore emotions that a person feels when interacting with a robot [
55].
The variety of devices capable of measuring heart rate is quite wide. They go from wearable devices such as smartwatches to body bands or stickers placed on the chest [
56]. Among all of them, it is necessary to take into account different limitations such as battery, connectivity problems, signal quality, accuracy, security when handling data and device pricing [
56].
The studies mentioned above claim that heartbeat sensors are valuable tools for monitoring stress and fatigue [
55]. That is because variability in heart rate can provide indications of a person’s emotional and physical state. However, when it comes to determining ErrPs or evaluating complex emotions and cognitive responses, such as those related to EEG, these sensors may not be the best choice due to their limited ability to capture detailed information about specific brain processes.
In the context of exploring methodologies relevant to medical research, several technologies have been identified as effective in evaluating an individual’s stress levels or physiological state in real-time. Notably, the measurement of cortisol, a hormone intricately associated with stress response, emerges as a significant area of interest [
57]. The precision offered by cortisol as a metric is high, yet the challenges lie in the actual process of measurement, particularly in achieving real-time data acquisition. Recent advancements have been made in real-time cortisol measurement, signaling a promising yet still emerging area of scientific exploration.
As research in the field advances, there is a notable increase in the variety of methods being developed to measure human interactions and stress-related emotions. Techniques such as monitoring breathing rate [
58] and observing changes in facial complexion, including blushing [
59], are indicators of stress levels. These technologies are essential in the realm of Human-Robot Interaction, providing a deeper understanding of human emotional states. Their integration into robotic systems holds the potential to significantly enhance the way robots interpret and respond to human emotions, leading to more empathetic and effective interactions.
Questionnaires and scales are self-reporting tools in which people answer questions about their emotional states and level of stress after performing specific tasks. Some of the most relatable examples of self-report questionnaires are the Beck Depression Inventory (BDI) to assess depression and the Perceived Stress Questionnaire (PSS) to measure perceived stress. BDI is a broadly used tool for measuring the severity of depression in individuals. It was developed by psychologist Aaron T. Beck in 1961 [
60,
61]. The Perceived Stress Questionnaire is used to assess how stressed a person feels in relation to his or her life experiences, circumstances and tasks [
62]. Despite the existence of numerous questionnaires that are consistent and achieve the proposed result, it is always possible to create questionnaires to reveal different variables of interest in a particular experiment.
In conclusion, this subsection highlights various techniques for measuring human states in the context of Human-Robot Interaction. While EEG remains prominent, alternative methods like eye tracking, computer vision, and cardiac rhythm analysis offer valuable insights. Each method has its strengths and limitations, making their selection context dependent.
3.3. Immersive Technologies as a Safe Training Ground for Reinforcement Learning in Human-Interactive Robotics
Since the field of robotics often involves risk to humans since robots are in direct contact with the user, it is necessary to determine the practicality of virtual reality in experiments related to collaborative robotics. Moreover, the aim of this section is to make a distinction between virtual reality, mixed reality and augmented reality in order to anylize the possibility of applying each of them to future research projects.
Virtual Reality refers to a computer-generated environment that immerses the user in a simulated reality, often using a headset or other sensory input devices [
63]. In VR, users can interact with and experience a digitally created world that can be entirely different from the physical world [
64,
65]. Augmented Reality (AR) overlays digital information or virtual elements onto the real-world environment [
66]. AR enhances the user’s perception of the physical world by providing additional digital content or information, often through a mobile device’s camera or specialized glasses [
67,
68].
Mixed Reality (MR) combines elements of both virtual reality and augmented reality. In MR, digital objects or information are integrated into the real world in a way that allows them to interact with physical objects and the user’s environment [
69].
In the field of augmented reality and collaborative robotics, a novel solution was proposed, involving a human-robot collaboration method using augmented reality to enhance construction waste sorting, aiming to improve both efficiency and safety [
70]. Mixed reality is also becoming more popular due to some recently released products, some interesting research ensure the possibility of applying mixed reality in robotic environments for enhancing the interactions between humans and robots [
71].
Virtual reality is widely used to simulate real environments, aiming to prevent harm to humans and train users for specific tasks. It’s especially prevalent in the medical field, where virtual reality trains medical students to perform complex surgeries [
72]. Additionally, this technology can be used to learn how to interact with robots in medical settings or other environments [
73].
The intersection between virtual reality and human interaction is also a hot topic. A system that combines robotics with virtual reality to improve welding by interpreting the welder’s intentions for robotic execution was newly introduced. Utilizing virtual reality for intention recognition enables precise and smooth robotic welding. This highlights the potential of integrating robotics and virtual reality in skilled tasks [
74].
On the other hand, in order to apply reinforcement learning for training a robotic system to take into account different emotions or states of a human beings, the ideal solution is virtual reality, as these environments meet the safety requirements to guarantee humans are not harmed during the experimentation. A recent study [
75] describes a method where the virtual reality environment adapts to participant behavior in real-time through reinforcement learning, focusing on encouraging helping behavior towards a victim in a simulated violent assault scenario.
If the robotic training has already been carried out, virtual reality is also a reliable validation technique to test the performance of the system in real life environments complying with the required safety conditions. Thanks to the virtual reality glasses that are on the market these days, environments can be generated in a very simple way. One of the examples in which this type of technology can be useful is when interacting with a factory or robotic facility, where the moving parts that can endanger human safety are simulated by the devices [
31,
76].
In several studies presented in previous sections that implemented virtual reality to provide a safe environment for experimentation, it became clear that the results are very similar to those obtainable in a real environment. Safety in robotic environments is critical, as the integrity of the users a major concern [
76].