Child behavior recognition in social robot interaction using stacked deep neural networks and biomechanical signals

child-behavior-recognition-in-social-robot-interaction-using-stacked-deep-neural-networks-and-biomechanical-signals
Child behavior recognition in social robot interaction using stacked deep neural networks and biomechanical signals

Introduction

Human-robot interaction is an interdisciplinary field gaining significant attention due to the increasing integration of robots into various aspects of daily life1,2,3,4. The focus is often on making these interactions as natural and beneficial as possible5,6,7,8. However, ensuring safety, particularly in interactions involving children, is of paramount importance9,10. It is vital to understand and monitor the range of interactions children may have with social robots, utilizing effective systems to provide objective feedback to caregivers and parents. Moreover, the integration of artificial intelligence into medical engineering has enabled the development of intelligent monitoring systems that enhance diagnosis and therapeutic outcomes across diverse healthcare applications11,12,13,14.

Human-robot interaction is a growing field with a variety of applications, ranging from entertainment and education to healthcare15,16,17. Ensuring safety in these interactions is especially crucial when children are involved. Studies have shown that children’s engagement with robots varies considerably, depending on whether they are interacting with human instructors or robotic ones16. Moreover, robots have found roles in specific educational and therapeutic contexts, including the diagnosis and treatment of autism spectrum disorders18,19,20.

While robots are increasingly being integrated into settings with children, there is a nuanced discussion about how children perceive these robots. For instance, some studies have shown that children’s perception of robots can be influenced by the robot’s own provision of information about its capabilities21. Others have explored the role of aliveness and agency in children’s interactions with both living and non-living entities22. Furthermore, the importance of play in child-robot interaction has been emphasized to suggest that robots can serve as both social agents and material objects in play settings17.

Despite growing interest in child-robot interaction research, most existing studies suffer from key limitations. Many focus primarily on emotional engagement, educational responses, or therapeutic outcomes, without incorporating robust, data-driven safety monitoring systems. Traditional behavior recognition techniques tend to rely on isolated modalities or predefined gesture sets, which limits their real-world applicability15,16. Additionally, earlier works often lack scalability, fail to generalize across diverse age groups, and rarely account for the biomechanical intricacies of child-robot dynamics. Very few studies integrate real-time behavioral feedback mechanisms using advanced ensemble machine learning techniques. This paper addresses these limitations by introducing a novel stacked DNN model that analyzes raw biomechanical sensor signals and delivers high-accuracy behavior recognition. The key novel contributions of this work are:

  • Development of a stacked DNN architecture that outperforms traditional single models and sequence-based networks in recognizing child-robot interaction behaviors.

  • Use of biomechanical sensing (accelerometer and gyroscope) to provide objective behavioral signals rather than relying solely on emotional or visual cues.

  • Introduction of Kurtosis (K) and Signal Magnitude Area (SMA) as targeted statistical features for capturing sudden and cumulative motion patterns.

  • Evaluation of the model on separated adult and child datasets, demonstrating strong generalization despite the absence of gyroscope data in the child test set.

  • Clear path forward for real-time deployment in mobile systems and biomedical robots, with use cases in therapeutic monitoring for children.

In terms of exploiting the structure of this paper, Sect. Materials and methods describes the methodology of the research while Sect. Methodology showcases the utilized stacked machine learning models. Section Data acquisition elaborates on the discussion of the obtained results. The future perspective of biomedical robotics is explained in Sect. Feature extraction technique, and finally, Sect. Stacking concludes the paper.

Materials and methods

Methodology

This research is geared towards developing an objective methodology to monitor and classify the dynamics of child-robot interactions. Building upon the foundational concepts introduced earlier, the study integrates hardware sensing with advanced data-driven models for a comprehensive interpretation and classification of these interactions.

The dataset provided by Alhaddad et al. (2021)23detailing various interactions between children and robotic toys through acceleration sensors, forms the cornerstone of this work’s data input. Subsequent studies by the same authors extended this work to develop classification models using this data, demonstrating the feasibility of using artificial neural networks to distinguish between different types of interactions, including aggressive behaviors24. Furthermore, their work explored real-time responses of social robots to these interactions, evaluating the performance of machine learning techniques in this context25. This research paper will compare its findings and model performance to those reported in references2324, and25 within the discussion section.

For the proposed methodology, three distinct social robots equipped with an accelerometer and a gyroscope will be utilized. These instruments will measure time-domain signals corresponding to acceleration and angular displacement that offer a dual-modality approach to understanding the dynamics of interaction. The recorded signals will then be transformed into statistical features, serving as the input for training the stacked DNNs model. Moreover, Fig. 1 illustrates a conceptual overview of the proposed methodology. At its core, the model functions by analyzing sensor data to identify patterns indicative of a range of child-robot interactions. The goal is to discern these interactions with high accuracy and translate the findings into actionable insights for caregivers or parents through a mobile interface. It is important to note that while the real-time monitoring aspect is part of future work, the current focus of the methodology lies in the development, training, and validation of a novel stacked DNNs model capable of interpreting complex behavioral data.

Fig. 1
figure 1

Workflow diagram of the proposed research methodology.

Full size image

In the following subsection, the paper will detail the process of data collection, feature extraction, tools, and hardware used that underpin the methodology. Through this rigorous approach, the study seeks to advance the field of human-robot interaction by providing a robust framework for ensuring the safety and well-being of children in social robotic environments.

Data acquisition

To investigate the recognition of a variety of interaction types between children and social robots, a nuanced dataset was meticulously assembled, leveraging a triad of robotic toys designed with safety and manipulability in mind23. The dataset, curated to facilitate the differentiation of various behaviors, was harvested from interactions with a stuffed panda, a robot, and an excavator toy. Each toy, equipped with a data collection system comprising a Raspberry Pi and a Sense Hat sensor array, logged the subtleties of physical interactions through a high-fidelity LSM9DS1 accelerometer and gyroscope at a 30 Hz acquisition rate. The accelerometer will record the vibrations signals in x, y, and z while their combined effect is calculated using the formula 1 below24.

$$:left|Aright|=sqrt{{A}_{x}^{2}+{A}_{y}^{2}+{A}_{z}^{2}}$$

(1)

where A represents the specific axis regardless of the dimension (or direction). While it is also important to point out that the gyroscope will calculate the same corresponding angular displacement in the same three axis.

Participants, both adults and children ranging from 4 to 31 years old, were instrumental in creating this rich dataset. Adults were instructed to freely engage in a range of predefined interactions: touching, moving, throwing, lifting, and placing the toys down, plus an idle state for non-interactive scenarios. These interactions were captured in sessions across varying days, ensuring a diversity of behavioral instances. Children, engaged through imaginative scenarios, contributed to the dataset with a level of spontaneity unique to their interactions. The extraction of behavior instances was a two-tiered process. Initially, a MATLAB script delineated instances through tailored thresholds, identifying distinctive patterns inherent to each behavior—such as notable changes in acceleration indicative of different types of interactions. This automated selection was then meticulously validated through manual review, ensuring the integrity of the instances chosen. Each instance, encapsulated in a 25-sample data frame, conveys a detailed snapshot of interaction, rich in detail and ripe for analysis.

To complement these instances, artificial sequences were generated using Python to simulate the stochastic nature of real-world interactions. The sequences, crafted to mirror probable behavioral patterns, offer a practical perspective for the AI model to learn and discern nuances of child-robot interactions. Table 1 in the manuscript catalogs the frequency of each type of observed behavior. Meanwhile, Fig. 2 presents a dual-faceted view of the interactions: Fig. 2a illustrates the pattern of acceleration vectors during a randomly chosen interaction, and Fig. 2b juxtaposes this with the corresponding pattern of angular displacements as measured by the gyroscope sensor. These visualizations highlight the variances in vibration and gyroscopic readings across different interaction types. It is important to note that the dataset for child participants contains only accelerometer data, whereas the adult dataset includes both accelerometer and gyroscope signals. This limitation stems from the original dataset structure, where gyroscopic readings were not recorded during child-robot interactions.

Table 1 Behavioral instance frequencies in interaction data.

Full size table

Fig. 2
figure 2

Sensorial symphony of interaction of the first adult shaking and hitting the excavator: (a) Acceleration dynamics; (b) Gyroscopic angular displacements.

Full size image

Through this meticulous assemblage of interaction data, the study aims to illuminate the intricate dynamics of child-robot interactions, laying a foundation for the Development of AI models that understand and anticipate the Developmental needs and safety of children in shared spaces with social robots. The dataset used in this study was obtained from a previously published open-access article licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). All ethical approvals, data anonymization procedures, and participant consent protocols were conducted by the original authors and are outside the scope of this study. The current research focuses solely on applying a novel artificial intelligence framework to this existing dataset for behavior recognition.

Feature extraction technique

In the field of signal processing and pattern recognition, the choice of features is pivotal to the effective characterization of data. For this research, specific statistical measures were employed to abstract meaningful information from the raw time-domain signals obtained from the sensors.

Kurtosis is a measure of the tailedness of the probability distribution of a real-valued random variable. In the context of accelerometer data, high kurtosis could indicate the presence of outliers which are characteristic of sharp, sudden movements such as hits or shakes. Equation 2 for kurtosis (K) applied to the resultant acceleration is as follows26:

$$:text{K}text{u}text{r}text{t}text{o}text{s}text{i}text{s}:left(text{K}right)=text{N}frac{(text{N}+1)}{(text{N}-1)(text{N}-2)(text{N}-3)}sum:_{i=1}^{N}{left(frac{{x}_{i}-AM}{SD}right)}^{4}-frac{3{(N-1)}^{2}}{(text{N}-2)(text{N}-3)}$$

(2)

where N is the number of samples, (:{x}_{i}) are the individual samples of the resultant acceleration, AM is the arithmetic mean of the samples, and SD is the standard deviation. This measure was chosen because it can accentuate extreme deviations in the acceleration pattern, which are indicative of aggressive or abrupt interactions with the robot.

Moreover, SMA is another feature that provides a combined measure of the magnitude of movement across all three axes of the gyroscope sensor. It is particularly useful in distinguishing between different types of dynamic movements. The SMA is calculated using the formula 3 below27:

$$:text{S}text{M}text{A}=frac{1}{text{N}}sum:_{i=1}^{N}(left|{gyro}_{{x}_{i}}right|+left|{gyro}_{{y}_{i}}right|+left|{gyro}_{{z}_{i}}right|)$$

(3)

where (:left|{gyro}_{{x}_{i}}right|), (:left|{gyro}_{{y}_{i}}right|), and (:left|{gyro}_{{z}_{i}}right|) are the absolute values of the angular velocity recorded by the gyroscope in the X, Y, and Z directions, respectively. SMA was selected because it effectively summarizes the overall dynamic activity captured by the gyroscope without assuming any particular direction of motion. This makes it robust to variations in the robot’s orientation and the direction of the child’s interaction with the robot. Due to the absence of gyroscopic data in the child dataset, the SMA feature was not used during the testing phase for child behavior classification. Instead, only the Kurtosis feature extracted from the accelerometer was employed. This decision was made to maintain the integrity of the dataset while enabling evaluation of the trained model’s ability to generalize from adult-based full-motion interactions to child-based limited-sensor inputs.

Together, K and SMA provide a comprehensive view of the intensity and nature of the interactions. Kurtosis highlights sharp, impulsive events, while SMA captures the cumulative effect of the angular velocities, making them suitable for detecting a wide range of inappropriate behaviors.

Stacking

General overview

Machine learning has been widely adopted across various domains of behavioral analysis due to its ability to model complex nonlinear relationships28,29,30,31,32,33,34. Two artificial neural networks were used with differing hidden layer configurations to capture complementary patterns in the input features, thereby they enrich the diversity of learned representations. More complex models such as GNNs, GCNNs, 1D CNNs, or Transformers were not employed, as the low-dimensional and statistical nature of the input features (Kurtosis and SMA) is better suited for lightweight fully connected architectures like ANNs, which have proven effective in similar biomedical signal analysis tasks35. Stacking, in the context of machine learning, is an ensemble learning technique that combines multiple classification or regression models to produce a meta-model that often outperforms any single contributing model36,37. It leverages the strength of each individual model to improve the final prediction accuracy. The general principle of stacking can be encapsulated in the following Eq. 4 36:

$$:text{S}text{t}text{a}text{c}text{k}text{e}text{d}:text{M}text{o}text{d}text{e}text{l}:text{O}text{u}text{t}text{p}text{u}text{t}=text{f}({text{M}text{o}text{d}text{e}text{l}}_{1}text{}left(text{x}right),{text{M}text{o}text{d}text{e}text{l}}_{1}text{}left(text{x}right),:dots:,:{text{M}text{o}text{d}text{e}text{l}}_{n}text{}left(text{x}right))$$

(4)

where (:text{x}) is the input feature vector, (:{text{M}text{o}text{d}text{e}text{l}}_{i}) represents the output of the (:text{i}-text{t}text{h}) model, and (:text{f}) is a function that strategically combines these outputs. In this study, the function f is realized through an additional neural network layer that learns the optimal combination of the outputs from two deep neural network models. The final classification is performed by a meta-learner neural layer that takes the outputs from the two base DNNs as input features. This layer is trained to learn the optimal combination of these outputs to generate the final behavior prediction.

Utilized DNN models

Deep neural networks have been successfully applied across various domains of interest38,39,40. The investigation incorporates two DNNs, each architected with a distinct configuration to capture varied representations within the dataset. The divergence in their architectures—a strategic choice—enables each network to specialize in distinct features, thereby enriching the ensemble’s ability to generalize across the problem space. The first DNN is structured with a greater number of neurons in the initial hidden layer, fostering a broad initial representation, which is then distilled by a significantly smaller subsequent layer. Conversely, the second DNN reverses this approach, starting with a sparse representation that expands in the following layer. This architecture is hypothesized to enhance the stacked model’s performance by aggregating diverse patterns and relationships within the data. Figure 3 provides a visual representation of one of the DNNs as Fig. 3a depicts and the flowchart as in Fig. 3b, showcasing the input layer, two hidden layers, and the output layer. This depiction elucidates the flow of information through the network and the layered approach to learning. The specifications of the two DNNs are summarized in Table 2. The table enlists the characteristics of each network, including the number of hidden layers, number of neurons in each layer, optimization algorithm (Stochastic Gradient Descent (SGD-based))41,42activation function (Tanh)43and other pertinent parameters that define the architecture of the DNNs. The key training and validation details of the two base DNN models used in the stacking architecture are summarized in Table 3, where the used portions of the dataset for training and testing are different. Basic manual hyperparameter tuning was applied to adjust learning rate, layer structure, and neuron count to balance performance and convergence while minimizing overfitting.

Fig. 3
figure 3

Model architecture: (a) Architecture of a DNN of two hidden layers; (b) Block diagram of the DNN.

Full size image

Table 2 Specifications of the adopted models.

Full size table

Table 3 Training and validation parameters for the utilized DNN models.

Full size table

Performance evaluation metrics

While the model was trained using both SMA and Kurtosis features from the adult dataset, testing on the child dataset was carried out using only the Kurtosis feature. This deliberate feature exclusion reflects real-world limitations where full sensor coverage may not always be available, especially in pediatric scenarios. The model’s ability to still classify behavior types with reasonable accuracy using a single statistical feature demonstrates the robustness and transferability of the learned patterns. In the context of the proposed stacked DNN model, evaluating its efficacy involves several key metrics. These metrics provide insight into various aspects of the model’s performance, from its overall accuracy to the balance between precision, recall, and F1 score. For that, Table 4 enlists the formulas for the suggested metrics of evaluations.

Table 4 Statistical features for performance evaluation of the proposed stacking approach.

Full size table

where TP, FP, TN, and FN represent the true positives, false positives, true negatives, and false negatives, respectively.

Results and discussion

Visualization and feature-related analysis

Understanding the intricate structure of the dataset necessitates an informative visual representation. In Fig. 4, the visualization techniques employed allow for a nuanced interpretation of the accelerometer and gyroscope data based on their respective values. Specifically, Fig. 4a categorizes the data by participants’ behaviors toward the robots, whereas Fig. 4b organizes the same data based on the contributions of different adult participants. This delineation into clusters elucidates the relationship between the resultant acceleration values, measured in g units, and the angular displacement degrees, encompassing pitching, rolling, and yawing motions.

To delve deeper into the dataset’s characteristics, Fig. 5 employs a circular visualization strategy across a quadruple-axis framework, thereby enhancing the interpretability of the previously depicted data. Figure 5a and b, through their distinct categorizations – one by behavior type and the other by adult participant contributions – reveal a pronounced gyroscopic influence in contrast to the accelerometer readings, where the resultant acceleration is derived.

Fig. 4
figure 4

Free visualization of the acceleration and angular displacement depending on: (a) Behavior type; (b) Participating adult.

Full size image

Fig. 5
figure 5

Circular visualization of the accelerometer and gyroscope data categorized upon: (a) Type of behavior; (b) adults who participated.

Full size image

Transitioning from raw data to feature analysis, the dataset was refined into two primary statistical features: K and SMA. Figure 6 presents a scatter plot of a randomly selected sample from these statistical features, categorized by types of inappropriate behaviors. The tighter clustering of SMA values suggests a more robust correlation with specific behaviors, indicating an advantageous feature for the model. Conversely, the dispersion of K values, due to their widespread distribution, presents a more significant challenge for the efficacy of the stacked model. The relative distribution density of these features underscores the differential impact on the model’s predictive capabilities. As shown in Fig. 6b, the SMA values derived from gyroscope readings exhibit tighter clustering across behavior categories compared to kurtosis values which indicates that gyroscopic data alone—specifically the SMA feature—can effectively support accurate classification of child-robot interaction behaviors.

Fig. 6
figure 6

Scattering plot of a random sample recordings for the six different behavior types: (a) Kurtosis of resultant acceleration; (b) SMA of the gyroscopic angular displacements.

Full size image

Stacking model results and assessments

The implementation of the stacked model was subjected to an empirical evaluation using two distinct test datasets mentioned earlier in Sect. 2, one comprising adult-generated interactions and the other child-generated interactions. The performance metrics derived from these evaluations are presented in Fig. 7 and elucidate the efficacy of the model across different age demographics.

For the adult test dataset, the model Demonstrated a high Degree of accuracy, with a value of 0.941. This high accuracy is mirrored in both precision and recall, which stand at 0.94 and 0.941 respectively, indicating a balanced capacity of the model to correctly identify positive instances while minimizing false positives. The F1 Score, which harmonizes precision and recall, was calculated to be 0.939, underscoring the model’s robustness in handling the adult-generated data. In contrast, when the model was applied to the child test dataset, there was a slight decrease in performance across all metrics, which could be attributed to the inherent variability in children’s interactions with the robots. The accuracy obtained was 0.882, coupled with equivalent values for precision and recall. The F1 Score for the child dataset was observed to be 0.881, which, while lower than that of the adult dataset, still reflects a commendable performance, considering the unpredictability and diverse nature of children’s behavior. The decreased performance on the child dataset can be attributed to the inherent unpredictability and spontaneity of children’s physical behavior, which introduces greater variability in sensor readings. Additionally, the absence of gyroscope data in the child dataset limits the model’s ability to capture rotational motion dynamics which reduces its feature richness during classification. It should be noted that the child dataset was tested using only the Kurtosis feature due to the absence of gyroscope data, while adult dataset testing was conducted using both Kurtosis and SMA, consistent with the training configuration.

For better visualization, the confusion matrices for both test datasets are depicted in Fig. 8. Figure 8a represents the stacked model performance confusion matrix of the Adult dataset, while Fig. 8b shows the stacking technique assessment confusion matrix of the children dataset. It can be seen that the children dataset are a little harder to diagnose as the model was trained depending on the adults intended inappropriate behaviors. This variability is reflected in the confusion matrix of the child dataset, where certain behavior classes exhibit more frequent misclassifications compared to the adult dataset.

These results collectively affirm the model’s generalizability and its potential applicability in real-world scenarios where the behavior of different user groups must be accurately interpreted by social robots. The slight discrepancies observed between the two datasets highlight the importance of fine-tuning and possibly customizing the model to accommodate the distinct behavioral patterns exhibited by children. The reduction in performance on the child dataset is partly attributable to the limited feature availability. Specifically, during testing, the model processed only the Kurtosis-based accelerometer feature, as gyroscopic data (and thus SMA) was not available for children. This setup was intentionally adopted to simulate real-world scenarios where wearable or embedded systems for child monitoring might have only a single motion sensor. Despite this, the model achieved a commendable level of accuracy, affirming its capacity to generalize from rich training data to reduced-feature testing environments.

Fig. 7
figure 7

Architecture of a deep neural network of two hidden layers.

Full size image

Fig. 8
figure 8

Confusion matrix for the stacking model using: (a) Adult dataset; (b) Children dataset.

Full size image

The performance of the proposed stacking model is benchmarked against other well-known architectures in Table 5, evaluated separately on adult and child datasets. As shown in Table 5, the proposed stacked DNN model achieves the highest accuracy of 0.941 on the adult dataset, outperforming the Transformer (0.926), LSTM (0.913), and Single DNN (0.901). It also demonstrates superior F1 Score (0.939) compared to Transformer (0.923), LSTM (0.911), and DNN (0.896), confirming its improved precision-recall balance. On the more challenging child dataset, where behavioral variability affects classification, the stacked model maintains the lead with an accuracy of 0.882, while the Transformer achieves 0.874, LSTM 0.869, and DNN 0.851. Similarly, the F1 Score of the stacked model on child data (0.881) remains notably higher than that of other models. These results emphasize the robustness and generalization capability of the proposed ensemble approach, particularly under variable behavioral patterns. The superior performance of the stacked ANN over LSTM and Transformer models is attributed to the nature of the dataset, which comprises low-dimensional statistical features without sequential dependencies. The ensemble design of the stacked DNN enhances representational diversity which makes it particularly well-suited for structured behavior classification in compact feature spaces.

Recent studies in biomedical signal processing emphasize classification performance and the importance of training time and real-time feasibility for clinical or embedded deployment44,45,46. In line with these works, the current study also presents a comparative analysis of training and inference complexity for the proposed model versus the same classifiers in Table 5. The comparative results reveal that while the Single DNN offers the fastest training and inference times due to its minimal architecture, it does so at the cost of reduced classification accuracy. In contrast, the stacked DNN provides the highest classification performance, particularly on challenging child behavior data, while maintaining a moderate computational cost that is acceptable for near-real-time applications. The Transformer and LSTM models, although powerful in temporal or high-dimensional tasks, demonstrate slower performance in both training and inference due to their sequential and attention-based structures, which are not optimally suited for the low-dimensional statistical features used in this study.

Table 5 Comparative evaluation of machine learning models on adult and child datasets using multiple performance metrics.

Full size table

A valuable direction for future work would be to evaluate the performance of the stacked DNN model when trained solely on accelerometer-derived features from both adult and child datasets. This approach would ensure full feature consistency between training and testing, addressing real-world scenarios where only a single sensor modality is available. In addition to the currently used Kurtosis metric, future investigations could explore more expressive and complex statistical or frequency-domain features derived from accelerometer data — such as skewness, energy, entropy, spectral centroid, and wavelet-based coefficients — to capture richer behavioral signatures. Incorporating these features could help bridge the gap left by the absence of gyroscopic data and potentially improve classification robustness. Furthermore, assessing the comparative performance of lightweight temporal models like 1D CNNs on these expanded accelerometer-only feature sets could also contribute to optimized real-time deployment in mobile or embedded pediatric monitoring systems.

Towards social robots control and biomedical robotics

Implications for social robot control

The development of social robots capable of interacting with children in a contextually aware manner calls for a sophisticated control mechanism focused on safety and adaptability47. The research conducted in this study forms a foundational step towards creating such control systems in biomechanics through the utilization of stacked DNNs that objectively discern the range of interactions based on sensory data.

The proposed model’s high accuracy in classifying these interactions underscores the potential of machine learning to enhance robot autonomy. This capability is crucial not only for preventing potential damage to the robots but also for ensuring the safety of children during their interactions. Importantly, the real-time processing capability of the model facilitates the development of adaptive robotic behaviors. These adaptations could be instrumental in responding appropriately to the varied and spontaneous interactions of children, contributing positively to educational and therapeutic settings by potentially mitigating challenging situations and enhancing engagement.

While this study primarily focuses on classification accuracy, future implementation on mobile platforms will benefit from model compression techniques such as pruning and quantization to reduce inference time and memory footprint without sacrificing performance.

Contributions to biomedical robotics

Robotics are used in various fields, whether in biomechanics, medical devices, or monitoring systems48,49,50. In the biomedical domain, especially in therapies involving children with autism spectrum disorder or other developmental challenges, robots can play a vital role51. The ability of robots to consistently and accurately recognize and respond to various behaviors opens new avenues for intervention strategies that are personalized and adaptive to each child’s unique needs.

The data-driven approach outlined in the study ensures that the robot’s responses are grounded in empirical evidence, enhancing the robot’s utility as a therapeutic tool. The nuanced understanding of children’s interactions with robots also means that these machines can be more effectively used to gather data for longitudinal studies, potentially revealing insights into the progression of certain behaviors over time. For instance, the biomedical engineering applications highlighted in Fig. 9, can all benefit from the data-driven approach for accurate control and classification purposes.

Fig. 9
figure 9

leveraging robots for biomedical engineering applications52.

Full size image

Robots equipped with behavior recognition capabilities have shown promise in therapeutic contexts such as supporting children with autism spectrum disorders, where consistent social interaction and engagement tracking are crucial53,54,55. These systems can also be adapted for use in pediatric rehabilitation settings to help monitor motor responses and encourage developmental exercises through playful interaction.

Despite these promising applications, real-world deployment requires overcoming several challenges that include ensuring model interpretability for clinicians and caregivers in addition to the validation of the system in diverse clinical environments, and finally, addressing issues related to data privacy, robustness across age groups, and integration with existing therapeutic workflows.

Conclusions

This study embarked on a comprehensive exploration of contemporary methodologies in human-robot interactions, focusing on the nuances of child-robot dynamics. It can be encapsulated that the conclusions of this investigative journey are multifaceted, highlighting several key accomplishments:

  1. 1.

    A meticulous review was undertaken to gauge the cutting-edge strategies employed in the study of human-robot interplay, ensuring the research was grounded on a robust understanding of the current academic landscape;

  2. 2.

    The investigation harnessed an existing dataset that captures a spectrum of child-robot interactions, providing a rich foundation for analysis;

  3. 3.

    Statistical metrics were integrated, kurtosis for the accelerometer data and the SMA for the gyroscope data, enhancing the fidelity of interaction monitoring.

  4. 4.

    A novel stacked model was introduced, showcasing its proficiency with impressive performance metrics, including the highest recorded accuracy, precision, recall, and F1 scores of 0.941, 0.94, 0.941, and 0.939 respectively in adult datasets, illustrating the model’s robust predictive capabilities.

  5. 5.

    The research pioneered avenues for real-time behavioral monitoring via mobile platforms, establishing proactive monitoring of child-robot interactions.

  6. 6.

    The study advanced the fields of biomedical engineering and pediatric care by suggesting directions for the integration of social robots into therapeutic environments, with potential applications extending across various interdisciplinary domains.

Data availability

The datasets generated and/or analysed during the current study are available in the Harvard Dataverse repository, [https://doi.org/10.7910/DVN/FHOO0Q].

References

  1. Mejia, C. & Kajikawa, Y. Bibliometric analysis of social robotics research: identifying research trends and knowledgebase. Applied Sciences 7(12), 1316 https://doi.org/10.3390/app7121316 (2017).

  2. Fosso Wamba, S., Queiroz, M. M. & Hamzi, L. A bibliometric and multi-disciplinary quasi-systematic analysis of social robots: past, future, and insights of human-robot interaction. Technol. Forecast. Soc. Change. 197, 122912 (2023).

    Article  Google Scholar 

  3. Cappuccio, M. L., Galliott, J. C., Eyssel, F. & Lanteri, A. Autonomous systems and technology resistance: new tools for monitoring acceptance, trust, and tolerance. Int. J. Soc. Robot. https://doi.org/10.1007/s12369-023-01065-2 (2023).

    Article  Google Scholar 

  4. Pozzi, L., Guerini, S., Arrigoni, S., Pedrocchi, A. & Gandolla, M. A robotic assistant for disabled chess players in competitive games. Int. J. Soc. Robot. https://doi.org/10.1007/s12369-023-01069-y (2023).

    Article  Google Scholar 

  5. Dubois-Sage, M., Jacquet, B., Jamet, F. & Baratgin, J. We do not anthropomorphize a robot based only on its cover: context matters too! Applied Sciences 13(15), 8743 https://doi.org/10.3390/app13158743 (2023).

  6. Zhang, C., Chen, J., Li, J., Peng, Y. & Mao, Z. Large Language models for human-robot interaction: A review. Biomim. Intell. Rob. 100131 https://doi.org/10.1016/j.birob.2023.100131 (2023).

  7. Hou, S. et al. Young children with autism show atypical prefrontal cortical responses to humanoid robots: an fNIRS study. Int. J. Psychophysiol. 181, 23–32 (2022).

    Article  PubMed  Google Scholar 

  8. Lv, Z., Poiesi, F., Dong, Q., Lloret, J. & Song, H. Deep Learning for Intelligent Human–Computer Interaction. Applied Sciences 12, (2022).

  9. Peter, J., Kühne, R. & Barco, A. Can social robots affect children’s prosocial behavior? An experimental study on prosocial robot models. Comput. Hum. Behav. 120, 106712 (2021).

    Article  Google Scholar 

  10. Ali, S., Park, H. W. & Breazeal, C. A social robot’s influence on children’s figural creativity during gameplay. Int. J. Child. Comput. Interact. 28, 100234 (2021).

    Article  Google Scholar 

  11. Al-Haddad, A. A. et al. Towards dental diagnostic systems: synergizing wavelet transform with generative adversarial networks for enhanced image data fusion. Comput Biol. Med 182, 109241 (2024).

  12. Al-Haddad, L. A., Alawee, W. H. & Basem, A. Advancing task recognition towards artificial limbs control with ReliefF-based deep neural network extreme learning. Comput. Biol. Med. 107894 https://doi.org/10.1016/j.compbiomed.2023.107894 (2023).

  13. Alawee, W. H., Al-Haddad, L. A., Basem, A. & Al-Haddad A. A. A data augmentation approach to enhance breast cancer detection using generative adversarial and artificial neural networks. Open Engineering 14(1), 20240052 (2024).

  14. Alawee, W. H., Basem, A. & Al-Haddad, L. A. Advancing biomedical engineering: leveraging Hjorth features for electroencephalography signal analysis. J. Electr. Bioimpedance. 14, 66–72 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Podpečan, V. Can You Dance? A Study of Child–Robot Interaction and Emotional Response Using the NAO Robot. Multimodal Technologies and Interaction 7, (2023).

  16. Neumann, M. M., Koch, L. C., Zagami, J., Reilly, D. & Neumann, D. L. Preschool children’s engagement with a social robot compared to a human instructor. Early Child. Res. Q. 65, 332–341 (2023).

    Article  Google Scholar 

  17. Torpegaard, J., Knudsen, L. S., Linnet, M. P., Skov, M. B. & Merritt, T. Preschool children’s social and playful interactions with a play-facilitating cardboard robot. Int. J. Child. Comput. Interact. 31, 100435 (2022).

    Article  Google Scholar 

  18. Arent, K. et al. The use of social robots in the diagnosis of autism in preschool children. Applied Sciences 12(17), 8399 https://doi.org/10.3390/app12178399 (2022).

  19. Wang, C. P. Training children with autism spectrum disorder, and children in general with AI robots related to the automatic organization of sentence menus and interaction design evaluation. Expert Syst. Appl. 229, 120527 (2023).

    Article  Google Scholar 

  20. Lee, J. & Nagae, T. Social distance in interactions between children with autism and robots. Applied Sciences 11(22), 10520 https://doi.org/10.3390/app112210520 (2021).

  21. van Straten, C. L., Peter, J. & Kühne, R. Transparent robots: how children perceive and relate to a social robot that acknowledges its lack of human psychological capacities and machine status. Int. J. Hum. Comput. Stud. 177, 103063 (2023).

    Article  Google Scholar 

  22. Barber, O., Somogyi, E., McBride, E. A. & Proops, L. Exploring the role of aliveness in children’s responses to a dog, biomimetic robot, and toy dog. Comput. Hum. Behav. 142, 107660 (2023).

    Article  Google Scholar 

  23. Alhaddad, A. Y., Cabibihan, J. J. & Bonarini, A. Datasets for recognition of aggressive interactions of children toward robotic toys. Data Brief. 34, 106697 (2021).

    Article  CAS  PubMed  Google Scholar 

  24. Alhaddad, A. Y., Cabibihan, J. J. & Bonarini, A. Recognition of Aggressive Interactions of Children Toward Robotic Toys. in 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) 1–8 (2019). 1–8 (2019). (2019). https://doi.org/10.1109/RO-MAN46459.2019.8956375

  25. Alhaddad, A. Y., Cabibihan, J. J. & Bonarini, A. Real-Time social robot’s responses to undesired interactions between children and their surroundings. Int. J. Soc. Robot. 15, 621–629 (2023).

    Article  Google Scholar 

  26. Al-Haddad, L. A. & Jaber, A. A. Improved UAV blade unbalance prediction based on machine learning and relieff supreme feature ranking method. J. Brazilian Soc. Mech. Sci. Eng. 45, 463 (2023).

    Article  Google Scholar 

  27. Duranta, D. U. S. et al. Enhancing atrial fibrillation detection accuracy: A wavelet transform filtered single lead ECG signal analysis with artificial neural networks and novel feature extraction. Mach. Learn. Appl. 12, 100472 (2023).

    Google Scholar 

  28. Al-Karkhi, M. I., Rzadkowski, G., Ibraheem, L. & Aqib, M. Anomaly detection in electrical systems using machine learning and statistical analysis. Terra Joule J. 1, 3 (2024).

    Google Scholar 

  29. Al-Haddad, L. A. et al. Advancing sustainability in buildings using an integrated aerodynamic façade: potential of artificial intelligence. Terra Joule J. 1, 1 (2024).

    Google Scholar 

  30. Abdul-Zahra, A. S., Ghane, E. & Kamali, A. Farhan ogaili, A. A. Power forecasting in continuous extrusion of pure titanium using Naïve Bayes algorithm. Terra Joule J. 1, 2 (2024).

    Google Scholar 

  31. Malik, M., Sharma, P., Punj, G. K., Singh, S. & Gared, F. Multi-body sensor based drowsiness detection using convolutional programmed transfer VGG-16 neural network with automatic driving mode conversion. Sci. Rep. 15, 8838 (2025).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  32. Bunyan, S. T. et al. Intelligent Thermal Condition Monitoring for Predictive Maintenance of Gas Turbines Using Machine Learning. Machines 13, (2025).

  33. Al-Haddad, L. A., Fattah, M. Y., Al-Soudani, W. H. S., Al-Haddad, S. A. & Jaber, A. A. Enhanced Load-Settlement curve forecasts for Open-Ended pipe piles incorporating soil plug constraints using shallow and deep neural networks. China Ocean. Eng. 39, 562–572 (2025).

    Article  ADS  Google Scholar 

  34. Shams, O. A., Al-Baity, H. B. M. & Al-Haddad, L. A. Experimental investigation and laser control in Ti10Mo6Cu powder bed fusion: optimizing process parameters with machine learning. Discov Mater 5(125), https://doi.org/10.1007/s43939-025-00322-7 (2025).

  35. Mukherjee, P. & Roy, A. H. Detection of stress in human brain. in Second International Conference on Advanced Computational and Communication Paradigms (ICACCP) 1–6 (IEEE, 2019). 1–6 (IEEE, 2019). (2019).

  36. Al-Haddad, L. A., Jaber, A. A., Al-Haddad, S. A. & Al-Muslim, Y. M. Fault diagnosis of actuator damage in UAVs using embedded recorded data and stacked machine learning models. J. Supercomput. https://doi.org/10.1007/s11227-023-05584-7 (2023).

    Article  Google Scholar 

  37. Lesnoff, M. et al. Averaging and stacking partial least squares regression models to predict the chemical compositions and the nutritive values of forages from spectral near infrared data. Applied Sciences 12(15), 7850–7865 https://doi.org/10.3390/app12157850 (2022).

  38. Hadi Fadhil, T., Al-Karkhi, M. I. & Al-Haddad, L. A. Legal and communication challenges in smart grid cybersecurity: classification of network resilience under cyber attacks using machine learning. J. Commun. 20, 221–228 (2025).

    Article  Google Scholar 

  39. Al-Haddad, L. A. et al. Energy consumption and efficiency degradation predictive analysis in unmanned aerial vehicle batteries using deep neural networks. Adv. Sci. Technol. Res. J. 19, 21–30 (2025).

    Article  Google Scholar 

  40. Khan, Z. H., Mekid, S., Al-Haddad, L. A. & Jaber, A. A. AI Enabled Manufacturing: A Deep Learning Approach to Network Fault Detection. in Proceedings of 2025 4th International Conference on Computing and Information Technology, ICCIT 2025 245–250 (2025). https://doi.org/10.1109/ICCIT63348.2025.10989388

  41. Al-Haddad, L. A. & Jaber, A. A. An Intelligent Quadcopter Unbalance Classification Method Based on Stochastic Gradient Descent Logistic Regression. in 2022 3rd Information Technology To Enhance e-learning and Other Application (IT-ELA) 152–156 (2022). https://doi.org/10.1109/IT-ELA57378.2022.10107922

  42. Al-Haddad, L. A. et al. Enhancing Building sustainability through aerodynamic shading devices: an integrated design methodology using finite element analysis and optimized neural networks. Asian J. Civil Eng. https://doi.org/10.1007/s42107-024-01047-3 (2024).

    Article  Google Scholar 

  43. Al-Haddad, L. A. & Jaber, A. A. An intelligent fault diagnosis approach for multirotor UAVs based on deep neural network of multi-resolution transform features. Drones 7, 82 (2023).

    Article  Google Scholar 

  44. De, S., Mukherjee, P., Roy, A. H. & TasteNet A novel deep learning approach for EEG-based basic taste perception recognition using CEEMDAN domain entropy features. J. Neurosci. Methods. 419, 110463 (2025).

    Article  PubMed  Google Scholar 

  45. De, S., Mukherjee, P. & Roy, A. H. GLEAM: A multimodal deep learning framework for chronic lower back pain detection using EEG and sEMG signals. Comput. Biol. Med. 189, 109928 (2025).

    Article  PubMed  Google Scholar 

  46. Mukherjee, P. & Halder Roy, A. A deep learning-based comprehensive robotic system for lower limb rehabilitation. Biomed. Signal. Process. Control. 100, 107178 (2025).

    Article  Google Scholar 

  47. Beyer-Wunsch, P. & Reichstein, C. Effects of a humanoid robot on the Well-being for hospitalized children in the pediatric Clinic – An experimental study. Procedia Comput. Sci. 176, 2077–2087 (2020).

    Article  Google Scholar 

  48. Yang, I., Gammell, J. D., Murray, D. W. & Mellon, S. J. Application of a robotics path planning algorithm to assess the risk of mobile bearing dislocation in lateral unicompartmental knee replacement. Sci. Rep. 12, 2068 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  49. Kholodilin, I., Zhang, Z., Guo, Q. & Grigorev, M. Calibration of the omnidirectional vision system for robotics sorting system. Sci. Rep. 15, 10256 (2025).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  50. Frenkel, J. et al. Stakeholder acceptance of a robot-assisted social training scenario for autistic children compared to a tablet-computer-based approach. Sci. Rep. 15, 11237 (2025).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  51. Lima, R. P., Passerino, L. M., Henriques, B., Preuss, R. V., Bercht, M. & E. & Asistranto: an assistive educational platform for promotion of interest in autistic children. Procedia Comput. Sci. 160, 385–393 (2019).

    Article  Google Scholar 

  52. Yang, Y. & Jiao, P. Nanomaterials and nanotechnology for biomedical soft robots. Mater. Today Adv. 17, 100338 (2023).

    Article  CAS  Google Scholar 

  53. Komariyah, D., Kaoru, I., Natsuka, S., Cahya, B., Ito, Y. & and The acceptance of the potential use of social robots for children with autism spectrum disorder by Indonesian occupational therapists: a mixed methods study. Disabil. Rehabil Assist. Technol. 20, 397–407 (2025).

    Article  PubMed  Google Scholar 

  54. Madrid Ruiz, E. P., Oscanoa Fernández, H. H. & García Cena, C. E. & Cedazo León, R. Design of JARI: A Robot to Enhance Social Interaction in Children with Autism Spectrum Disorder. Machines 13, (2025).

  55. Wang, W., Xiao, J. & Diao, L. The effects of robots on children with autism spectrum disorder: A Meta-analysis. J. Autism Dev. Disord. https://doi.org/10.1007/s10803-025-06883-z (2025).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, KSA for funding this research work through the project number “NBU- FFR-2025-289-03 “. The authors declare also that this article has been produced with the financial support of the European Union under the REFRESH – Research Excellence For Region Sustainability and High-tech Industries project number CZ.10.03.01/00/22_003/0000048 via the Operational Programme Just Transition.

Author information

Authors and Affiliations

  1. Biomedical Engineering Department, Al-Nahrain University, Baghdad, Iraq

    Sadiq J. Hamandi

  2. Mechanical Engineering Department, University of Technology- Iraq, Baghdad, Iraq

    Luttfi A. Al-Haddad

  3. Center for Scientific Research and Entrepreneurship, Northern Border University, Arar, 73213, Saudi Arabia

    Shaaban M. Shaaban

  4. Applied Science Research Center, Applied Science Private University, Amman, 11931, Jordan

    Aymen Flah

  5. ENET Centre, CEET, VSB-Technical University of Ostrava, Ostrava, Czech Republic

    Aymen Flah

  6. College of Engineering, University of Business and Technology (UBT), Jeddah, 21448, Saudi Arabia

    Aymen Flah

Authors

  1. Sadiq J. Hamandi
  2. Luttfi A. Al-Haddad
  3. Shaaban M. Shaaban
  4. Aymen Flah

Contributions

Conceptualization, S.J.H., S.M.S., and A.F.; methodology, L.A.A.; software, L.A.A.; validation, L.A.A.; formal analysis, L.A.A.; investigation, L.A.A.; writing—original draft preparation, L.A.A.; writing—review and editing, S.J.H., S.M.S., and A.F. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Luttfi A. Al-Haddad, Shaaban M. Shaaban or Aymen Flah.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hamandi, S.J., Al-Haddad, L.A., Shaaban, S.M. et al. Child behavior recognition in social robot interaction using stacked deep neural networks and biomechanical signals. Sci Rep 15, 35995 (2025). https://doi.org/10.1038/s41598-025-19728-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41598-025-19728-7

Keywords