Introduction


Google Scholar
 
S Sanjaykumar K Udaichi G Rajendiran M Cretu Z Kozina 2024 Cricket performance predictions: a comparative analysis of machine learning models for predicting cricket player’s performance in the one day international (ODI) world cup 2023 J. Health Sport Rehab. 10 1 6 19This process is recursively applied to both subregions until the stopping condition is satisfied. Eventually, the input space is partitioned into M regions R and a decision tree (fleft(xright)) is generated, as shown in Eq. (16).Article 
MATH 

Google Scholar
 
MATH 

Google Scholar
 
Sorry, a shareable link is not currently available for this article.iashchynskyi, P., & Liashchynskyi, P. Grid search, random search, genetic algorithm: a big comparison for NAS. arXiv preprint arXiv. 1912, 06059 (2019).

Methods and data

Data source

Article 

Google Scholar
 
Average effect of SHAP values on model outputs.

Group environment questionnaire (GEQ)

Article 
MATH 

Google Scholar
 

Sports passion questionnaire (SPQ)

(a_i), (a_i^*) is the Lagrange multiplier and (kleft(x_i,xright)) is defined as the kernel function. There are a number of different kernels used to generate the inner product of the machine to be constructed in the input space. Different kernels will have different nonlinear decision surfaces and thus give different results. The kernel was chosen according to the characteristics of the input. In general, a common example of kernel function is Gaussian Radial Basis Function (RBF). In this paper, we also select RBF as kernel function, which is defined as follows Eq. (20).

Sports mental toughness questionnaire (SMTQ)

Article 
MATH 

Google Scholar
 

Athlete engagement questionnaire (AEQ)

The SPQ was compiled by Vallerand (2010)37. The use of Chinese athletes yielded favorable results in terms of reliability and validity. The scale comprises 16 items, encompassing three dimensions: general passion, harmonious passion and compulsive passion. The Cronbach’s alpha coefficient for this scale is 0.941, which meets the research requirements.

Machine learning model construction process

The RFR algorithm is an ensemble learning algorithm that combines multiple decision trees for prediction, and is particularly suited for handling high-dimensional and nonlinear complex datasets. This method effectively prevents data overfitting and allows for rapid training on test data. As shown in Fig. 1, the RFR algorithm consists of many decision trees, each constructed based on data samples from the training set. When performing regression tasks, the predictions from each tree are averaged to obtain the final prediction value. Typically, deep decision trees tend to overfit, but random forests can effectively mitigate this issue by training each tree with feature and sample subsets41. Features and samples were randomly selected from the input data, with each decision tree trained independently. The predictions of these trees were then aggregated to achieve more accurate predictions than a single decision tree model51.Article 
ADS 
CAS 
MATH 

Google Scholar
 
Machine learning model establishment process. This research employed Python version 3.9.1 and utilized the Sklearn library to construct a machine learning prediction model for athlete engagement. The specific methodology is outlined as follows:

Data collection and preprocessing

Therefore, the potential for implementing machine learning in sports is extensive. The application of machine learning in sports prediction offers significant opportunities for China to play a leading role on the global sports stage, having a profound impact on enhancing the overall competitiveness of the sports industry and advancing scientific excellence in sports.DW Ariani 2021 The relationship of passion, burnout, engagement, and performance: An analysis of direct and indirect effects among indonesian students J. Behav. Sci. 16 2 86 98In order to obtain sufficient data sets, this study distributed and collected questionnaires to athletes in multiple professional sports teams, high-level national training teams from Zhejiang, Heilongjiang, and Liaoning provinces by means of Wenjuanxing. From 1 June to 1 October 2021, a total of 445 questionnaires were distributed. Of these, 175 (53.7%) were male participants, a higher rate than females, and the average age of the athletes was between 17 and 22 years old and had about three years of training experience. Sports include: basketball (74), volleyball (36), soccer (63), cricket (47), ice hockey (20), curling (11), team aerobics (47) and others (28). All questionnaire surveys were conducted after sports training sessions. The questionnaire scale covered in this study can be divided into three parts: in the first part, we made it clear that the survey was based on the principles of voluntariness and anonymity; the responses to the questionnaire were used only for the researchers; they should not be used for any kind of profit or other purposes. The second part was to collect basic information about the athletes. In the third part, we set up the scales required for this study, including cohesion (Group Environment Questionnaire), passion (Sports Passion Questionnaire), mental toughness (Sports Mental Toughness Questionnaire) and athlete engagement (Athlete Engagement Questionnaire).

$$beginarray*20c {N = fracleft( X – X_min right){{left( {X_{max} – X_min } right)}} } endarray$$
(1)

Written informed consent was signed by each team representative and all participants were informed that they could withdraw from the study at any time. All methods were carried out in accordance with relevant guidelines and regulations. This study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Ethics Committee of Zhejiang Normal University (ZSRT2022079).

Division of training set and test set

where (x_i) is an element of the training set, x is an element of the test set, and σ is a parameter of the RBF. However, the parameters of SVM are predetermined. Cross-validation is used to determine the parameter values. Whereas, SVM with fixed parameters are not applicable to changing athlete data.

Table 1 Comparative analysis of RMSE for predicting different feature combinations under different machine learning algorithms.

Machine learning model algorithm test

The results of the SHAP analysis show that the three features cohesion, passion and mental toughness have a decisive influence on the predictions of model clearly. The data in Fig. 4 demonstrate the average SHAP values for the features Mental Toughness (MT), Passion (Pa), Cohesion (Co), Sports Satisfaction (SS) and Psychological Collectivism (PC), which visualise the average influence of each feature in the model predictions. It can be observed from the Fig. 4 that features MT, Pa and Co have higher average SHAP values, indicating that they occupy an significant position in model prediction. We further refine these to the level of a single sample, revealing the relationship between the feature values and their SHAP values for this sample which demonstrating the high prediction accuracy of the model by comparing the model predicted values with the actual values, as shown in Fig. 5. In this sample, the positive effect of feature MT is significant particularly, while the effects of features PC and SS are minor relatively. Combining the analyses in Figs. 4 and 5, we conclude that in the study sample, feature MT is the most critical factor influencing the prediction results of model, while features Pa and Co also have a significant impact. However, the impact of features PC and SS is weak relatively. These findings provide valuable information for an in-depth understanding of the decision-making mechanism of the model and also indicate the key features to focus on during data analysis and model optimisation.Article 

Google Scholar
 
D Fletcher M Sarkar 2012 A grounded theory of psychological resilience in Olympic champions J. Psychol. Sport Exerc. 13 5 669 678

Feature selection of machine learning model

GBRT is an ensemble algorithm with a powerful learning strategy. Although it was initially designed to address classification problems, it has been successfully applied to the field of regression48. In the gradient boosting process, each step aims to minimize the loss function by adding elementary tree that reduce the loss function at each step, ultimately minimizing the loss function. In this way, GBRT is able to learn complex nonlinear relationships within the data.Y Kittichotsatsawat N Tippayawong KY Tippayawong 2022 Prediction of Arabica coffee production using artificial neural network and multiple linear regression techniques J. Sci. Rep. 12 1 14488

Table 2 Comparative analysis of predicting different feature combinations under different machine learning algorithms.

Output machine learning model prediction results

The regression tree ({h}_{m}(x)) is constructed by modeling ({z}_{m}(x)) and x. The weighting factors can be obtained by multiplying the importance of each feature by the corresponding coefficient of the model. Its formula is expressed in Eq. (12).

Machine learning model evaluation

H Liu W Hou I Emolyn Y Liu 2023 Building a prediction model of college students’ sports behavior based on machine learning method: combining the characteristics of sports learning interest and sports autonomy Sci. Rep. 13 1 15628

$$beginarray{*20l} {overline{Y} = frac{1}{m}mathop sum limits_{i = 1}^{m} Y_i } endarray$$
(2)

Xin Zhang and Zhikang Lin contributed equally to this work and should be regarded as co first authors.Article 
MATH 

Google Scholar
 

$$beginarray{*20l} {R^{2} = 1 – frac{{mathop sum nolimits_{i = 1}^{m} left( {Y_i – widehat{{Y_i }}} right)^{2} }}{{mathop sum nolimits_{i = 1}^{m} left( {Y_i – overline{Y}} right)^{2} }}} endarray$$
(3)
$$beginarray{*20l} {RMSE = sqrt {frac{1}{m}mathop sum limits_{i = 1}^{m} left( {Y_i – widehat{{Y_i }}} right)^{2} } } endarray$$
(4)
$${text{MSE }} = { }frac{1}{m}mathop sum limits_{i = 1}^{m} left( {Y_i – widehat{{Y_i }}} right)^{2}$$
(5)
$$beginarray{*20l} {MAE = frac{1}{m}mathop sum limits_{i = 1}^{m} left| {Y_i – widehat{{Y_i }}} right|} endarray$$
(6)

Machine learning algorithm theory

LR algorithm

D Chicco MJ Warrens G Jurman 2021 The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation J. PeerJ Comput. Sci. 7 623

$$beginarray*20c {hat{y}left( {w,x} right) = omega_{0} + omega_{1} x_{1} + cdots + omega_{c} x_{c} + b} endarray$$
(7)

MA Mohammed 2023 Adaptive secure malware efficient machine learning algorithm for healthcare data J. CAAI Trans. Intell. Technol. 2 1 12

KNN algorithm

A Dorothea Wahyu 2021 The relationship of passion, burnout, engagement, and performance: an analysis of direct and indirect effects among Indonesian students J. Behav. Sci. 16 2 86 98Figure 7 displayed the prediction accuracy of regression models constructed using different machine learning algorithms. By comparing the R2 scores, the overall effectiveness of the different prediction models could be determined. The R2 score, also known as the coefficient of determination, is an important metric for evaluating the quality of regression models. It measures the extent to which the model explains the variability in the dependent variable, reflecting the accuracy of the model’s predictions. In model evaluation, a higher R2 score is preferred, indicating that the model fits the data better.

$$beginarray*20c {hat{y}left( x right) = frac{{mathop sum nolimits_{{x_i in N_{k} left( x right)}} omega left( {x,x_i } right)y_i }}{{mathop sum nolimits_{{x_i in N_{k} left( x right)}} omega left( {x,x_i } right)}} } endarray$$
(8)

You can also search for this author in
PubMed Google Scholar

GBRT algorithm

Article 

Google Scholar
 
CM Bishop NM Nasrabadi 2006 Pattern Recognition and Machine Learning Springer New York 738

$$beginarray*20c {F_{0} left( x right) = arg beginarray*20c min y endarray mathop sum limits_{t = 1}^{N} Lleft( {y_{t} ,gamma } right)} endarray$$
(9)
$$beginarray*20c {F_{m} left( x right) = F_{m – 1} left( x right) + gamma_{m} h_{m} left( x right)} endarray$$
(10)

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

$$beginarray*20c {z_{m} left( {x_{t} } right) = – frac{{partial Lleft( {y_{t} ,F_{m – 1} left( {x_{t} } right)} right)}}{{partial F_{m – 1} left( {x_{t} } right)}}} endarray$$
(11)

You can also search for this author in
PubMed Google Scholar

$$beginarray{*20l} {y_{m} = arg beginarray*20c min y endarray mathop sum limits_{ – l = 1}^{N} Lleft( {y_{t} ,F_{m – 1} left( {x_{t} } right) – y_{m} h_{m} left( {x_{t} } right)} right)} endarray$$
(12)

DTR algorithm

In the construction of machine learning models, the RFECV method was used for feature selection. As an efficient wrapper-based feature selection technique, RFECV combined the advantages of Recursive Feature Elimination (RFE) and Cross-Validation (CV), enabling the automatic filtering of feature subsets that had the greatest impact on model performance.Article 

Google Scholar
 

$$beginarray{*20l} {left( {j,s} right)^* = {}_{j,s }^{argmin} left[ {mathop sum limits_{{x_i in R_{1} left( {j,s} right)}} left( {y_i – c_{1} } right)^{2} + mathop sum limits_{{x_i in R_{2} left( {j,s} right)}} left( {y_i – c_{2} } right)^{2} } right]} endarray$$
(13)

Article 

Google Scholar
 

$$beginarray{*20l} {c_{m} = frac{1}{{N_{m} }}mathop sum limits_{{x_i in R_{m} left( {j,s} right)}} y_i ,m = 1,2} endarray$$
(14)

N represents the normalized data, X represents the measured value, ({X}_min) and ({X}_{max}) represent the minimum and maximum values, respectively.

$$beginarray{*20l} {R_{1} left( {j,s} right) = left. x right|x_{j} le s,R_{2} left( {j,s} right) = left. x right|x_{j} > s} endarray$$
(15)

The hyperparameters of the LR model include: copy_X (default True, prevents modification of the original data), fit_intercept (default True, calculates the intercept), n_jobs (default None, does not compute in parallel), and positive (default False, does not force the coefficients to be positive). For the optimal configuration of SWO combined with GBR, DTR, RFR and KNNR, the following hyperparameters are explained: in the GBR model, the learning rate is set to 0.1837, a lower value that helps the model to perform more fine-grained weight adjustments during the training process, thus improving the accuracy of the prediction; the maximum depth is 1.0, which indicates that we adopt a simpler tree structure to prevent overfitting; the number of estimators is 19.13, indicating that the predictive ability of the model is enhanced by adding more trees. In the DTR model, the optimal maximum depth is about 4.46, which helps the model to capture complex patterns in the data while avoiding the overfitting problem caused by too deep a tree structure. The number of estimators in the RFR model is 18.85, which indicates that the generalisation of model is enhanced by integrating more than one decision tree; the maximum depth is 37.85, which allows the model to learn more complex feature relationships; and the minimum sample leaf is 7.62, ensuring that the leaf nodes contain sufficient information. For the KNNR model, the optimal number of neighbours is 16, which indicates that the model makes predictions by considering the data points of the 16 nearest neighbours; the distance metric is set to “distance”, which indicates that the model takes into account the distances between the data points when calculating the predicted values.

$$beginarray{*20l} {fleft( x right) = mathop sum limits_{m = 1}^{M} hat{c}_{m} Ileft( {x in R_{m} } right)} endarray$$
(16)

Athlete engagement is influenced by several factors, including cohesion, passion and mental toughness. Machine learning methods are frequently employed to construct predictive models as a result of their high efficiency. In order to comprehend the effects of cohesion, passion and mental toughness on athlete engagement, this study utilizes the relevant methods of machine learning to construct a prediction model, so as to find the intrinsic connection between them. The construction and comparison methods of predictive models by machine learning algorithms are investigated to evaluate the level of predictive models in order to determine the optimal predictive model. The results show that the PSO-SVR model performs best in predicting athlete engagement, with a prediction accuracy of 0.9262, along with low RMSE (0.1227), MSE (0.0146) and MAE (0.0656). The prediction accuracy of the PSO-SVR model exhibits an obvious advantage. This advantage is mainly attributed to its strong generalization ability, nonlinear processing ability, and the ability to optimize and adapt to the feature space. Particularly noteworthy is that the PSO-SVR model reduces the RMSE (7.54%), MSE (17.05%), and MAE (3.53%) significantly, while improves the R2 (1.69%), when compared to advanced algorithms such as SWO. These results indicate that the PSO-SVR model not only improves the accuracy of prediction, but also enhances the reliability of the model, making it a powerful tool for predicting athlete engagement. In summary, this study not only provides a new perspective for understanding athlete engagement, but also provides important practical guidance for improving athlete engagement and overall performance. By adopting the PSO-SVR model, we can more accurately identify and optimise the key factors affecting athlete engagement, thus bringing far-reaching implications for research and practice in sport science and related fields.

RFR algorithm

J Bi 2023 PSOSVRPos: WiFi indoor positioning using SVR optimized by PSO Expert Syst. Appl. 222 119778

Fig. 1
figure 1
In our proposed method, as shown in Fig. 2, PSO is used to optimize the parameters (C and g) of SVR and the data is divided into training and validation sets. We select the best parameters by optimizing the prediction accuracy of the validation set. The cost function is presented in Eq. (21).

SVR algorithm

S Gu L Xue 2022 Relationships among sports group cohesion, psychological collectivism, mental toughness and athlete engagement in Chinese team sports athletes Int. J. Environ. Res. Public Health 19 9 4987

$$beginarray{*20l} {fleft( x right) = w^{T} varphi left( x right) + b} endarray$$
(17)

Suppose the training set (T={left(x_{1},{y}_{1}right),cdots ,left(x_{N},{y}_{N}right)}), (x_i) is the feature vector of an instance and ({y}_iin {{c}_{1},{c}_{2},cdot cdot cdot ,{c}_{n}}) is the class of that instance, (i =(text{1,2},cdot cdot cdot ,n)), for a test instance x, its class y can be represented in47 Eq. (8).

$$beginarray{*20l} {R_{reg} = frac{1}{2}left| {left| w right|} right|^{2} + C times frac{1}{n}mathop sum limits_{i = 1}^{n} left| {y_i – fleft( x right)} right|_{varepsilon } } endarray$$
(18)

Stander, F.W., Leon, T. D. B., Stander, M., Mostert, K. & Coxen, L. A strength-based approach to athlete engagement: An exploratory study. J. South Afri. J. Res. Sport Phys. Educ. Recreat. 39, 165–175 (2017).

$$beginarray{*20l} {fleft( x right) = mathop sum limits_{i = 1}^{n} left( {a_i – a_i^* } right)kleft( {x_i ,x} right) + b} endarray$$
(19)

The choice of PSO-SVR over integrated learning is mainly based on its advantages in terms of interpretability, applicability to small datasets and strong generalization capabilities. This choice is suitable for scenarios particularly where accurate predictions need to be extracted from limited data. In contrast, integrated learning models tend to be more complex, demand higher computational resources, and may perform poorly when dealing with small sample data.

$$beginarray{*20l} {fleft( x right) = mathop sum limits_{i = 1}^{n} left( {a_i – a_i^* } right)exp left( { – frac{{left| {|x_i – x} right||^{2} }}{{2sigma^{2} }}} right) + b} endarray$$
(20)

C Lonsdale K Hodge TD Raedeke 2007 Athlete engagement: I. A qualitative investigation of relevance and dimensions Int. J. Sport Psychol. 38 451 470

PSO-SVR algorithm

X Yizi 2024 Bioinformatics combined with machine learning to identify early warning markers for severe dengue J. China Med Univ. 53 7 583 590With the RFECV method, we successfully filtered the optimal feature subset for the machine learning model based on cohesion, passion and mental toughness. Figure 3 demonstrates that the model was effective in predicting athlete engagement using three feature subsets. These selected features comprised cohesion, passion and mental toughness, covering dimensions such as ATG-T, ATG-S, GI-T and GI-S; passion criteria, harmonious passion and obsessive passion; and confidence, constancy and control. The synergy of these features contributed to the model’s strong performance in predicting athlete engagement.

$$beginarray{*20l} {fleft( {C,g} right) = min arg mathop sum limits_{m}^{M} mathop sum limits_{n}^{N} Lossleft( {y_i – mathop ylimits_i left( {c_{m} ,g_{n} } right)} right)} endarray$$
(21)

M Abdel-Basset R Mohamed M Jameel 2023 Spider wasp optimizer: A novel meta-heuristic optimization algorithm Artif. Intell. Rev. 56 11675 11738

Fig. 2
figure 2
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article 
MATH 

Google Scholar
 

Results

Model feature selection results

MATH 

Google Scholar
 
MATH 

Google Scholar
 

Fig. 3
figure 3
G Song 2023 Association between coach-athlete relationship and athlete engagement in Chinese team sports: The mediating effect of thriving J. PLoS ONE. 18 e0289979

Algorithm for making predictions using random forest.

Fig. 4
figure 4
where (omega left(x,x_iright)) is the weight function, usually defined as (omega left(x,x_iright)=frac{1}{{dleft(x,x_iright)}^{p}}), d is the distance function and p is the power parameter of the Minkowski distance.
Fig. 5
figure 5
The study participants included professional athletes and high-level college athletes, selected from the national training team and various regions in China, such as Zhejiang, Heilongjiang and Liaoning. These athletes have had sports careers spanning over three years and have shown exceptional performance in their fields. Questionnaires were administered after athletic training sessions. Researchers obtained consent from both coaches and athletes, thoroughly explaining the study’s objectives and confidentiality protocols. Each team representative signed an informed consent form to ensure respondents were fully aware of any potential risks. To reduce the influence of coaches, the questionnaires were given to team captains for distribution. Participants took about 12 min to complete the questionnaire. The collected data included basic information, cohesion, passion, mental toughness and athlete engagement. Of the 445 precious data collected, we performed data preprocessing with the aim of identifying and dealing with outliers and outliers. In this research, we identified outliers in two specific scenarios: first, where all responses to the questionnaire were identical, suggesting a lack of variability in the data; and second, where the time taken to complete the questionnaire was exceptionally brief which amounting to less than 3 min. Following a stringent vetting process, we removed those questionnaires in which such conditions existed. This meticulous approach resulted in a dataset comprising 326 high-quality, valid entries, achieving a response rate of 73.3%. Importantly, the dataset was devoid of any missing values, which reinforces the integrity and dependability of our study findings.

Download references

Table 3 Comparative analysis of optimization algorithms for various machine learning models.
Fig. 6
figure 6
We employ the cross-validation recursive feature elimination (RFECV) method40 for feature selection, removing features with lower weights iteratively until the model’s performance ceases to improve. Features are ranked by importance, and RFECV identifies the optimal number of features to retain automatically, ensuring the model includes the most predictive ones.

By considering the advantages and disadvantages of these algorithms and optimization strategies, we found that the SVR model combined with PSO performs the best in predicting athlete participation. The PSO-SVR model optimizes the parameters of the SVR through the PSO algorithm, which effectively improves the predictive performance of the model and reduces the risk of overfitting. Compared to other models, it performs better in handling nonlinear relationships and complex feature interactions. This result indicates that the PSO-SVR model has higher accuracy and stronger generalization ability in the task of athlete engagement prediction, which provides a valuable reference for future research and practice.where ({N}_{m}) is the number of samples in sub-region ({R}_{m}(j,s)). The data is divided into regions by selected ((j,s)) and the corresponding output values are determined as shown in Eq. (15).

Comparison of prediction models under different machine learning algorithms

where ({widehat{c}}_{m}) is the predicted value of the (m) th region and (I(xin {R}_{m})) is an indicator function to determine whether (x) belongs to region ({R}_{m}) . In this way, the decision tree can predict and analyze the data effectively.

Fig. 7
figure 7
The other four algorithms showed lower accuracy and higher errors, indicating that these models are not as effective as PSO-SVR in terms of prediction performance. In addition, the computational cost is related closely to the training and inference time of the model, which is especially obvious in the performance of different optimization algorithms. For example, the GBR model with GA takes as much as 184.3377 s, while the GBR model with PSO takes only 35.9754 s; similarly, applying PSO optimization on the SVR model takes only 5.7664 s. This study suggests that the PSO-SVR model is one of the best models for predicting athlete engagement based on cohesion, passion and mental toughness. It demonstrates high reliability and validity in predicting athlete engagement.

As a core concept in the field of positive psychology, engagement refers to an individual’s persistent mental state filled with positive emotions, which is an important factor in influencing an individual’s behavioral performance and the direction of behavioral development. Although previous research in the area of sport performance has focused on burnout mostly as leading to maladaptation and sport dropout4,5. However, with the growth of the positive psychology movement, athlete engagement has received much attention and related research has become increasingly prominent6. Early identification of low-engagement individuals allows for intervention before the risk of burnout and disengagement occurs7. It needs to be specified that a state of non-burnout is not the same as engagement, and that low engagement is not burnout. Positive feedback can be obtained when we view athlete performance in a positive manner. Therefore, athlete engagement, as an important indicator of positive psychology of athletes, plays an important and positive role in improving athletes’ competitiveness and athletic performance. A high level of engagement is conducive to stimulating positive qualities that promote the development and maturation of athletes8, laying the foundation for enhancing their athletic ability9 and ultimately translating into significant improvements in athletic performance and achievement7.J Chuantong R Norsilawati Abdul R Nelfianty Mohd 2023 A Preliminary psychometric investigation of a Chinese version of athlete engagement questionnaire (AEQ) Int. J. Hum. Mov. Sports Sci. 11 1 103 111

Fig. 8
figure 8
V Sarlis C Tjortjis 2020 Sports analytics-evaluation of basketball players and team performance J. Inform. Syst. 93 101562

In this study, we conducted a comprehensive analysis of the dataset with the aim of evaluating the applicability of different machine learning algorithms in constructing a prediction model for athlete participation based on their characteristics and advantages. We selected LR, KNN, GBRT, DTR, RFR, SVR and models combining four optimization algorithms (RGS, GA, SWO and PSO) to be trained and tested with the aim of identifying the most suitable models for predicting machine learning models for athlete engagement. During the course of our research, we found that each algorithm and its optimization strategy face unique challenges and potential limitations:

Fig. 9
figure 9
To go beyond these limitations, future research should consider introducing more physiological and behavioral data or employing finer classification techniques to reveal the true state of the athlete. In order for the model to be of broad benefit, we suggest that the data output from the model can be integrated with the data fed back from wearable devices. Instant feedback is then provided to coaches through devices such as coaching dashboards in order to tailor training programs to the individual needs of the athlete. By monitoring changes in athletes throughout the training cycle, we can gain a more dynamic and comprehensive view of the data. In addition, the integrated data can provide data-driven decision support for team management and optimize resource allocation. Multi-model integration and adaptive learning techniques are used to extend the model and ensure its performance in complex environments. Besides that, incorporating advanced machine learning techniques such as deep learning is expected to further improve the accuracy of the predictions. Exploring hybrid optimization strategies such as PSO combined with Nutcracker Optimization Algorithm (NOA) or Spider Web Optimization (SWO) is an innovative way to improve the performance of Support Vector Machines (SVM). These research paths will not only advance computer science and psychology research, but will also bring deeper insights and greater value to the field of exercise science by making these models more sophisticated and effective in terms of theoretical exploration and practical application.
Fig. 10
figure 10
Zuyu, Y. Research on support vector machine classification method based on principal component analysis and recursive feature elimination. Harbin Institute of Technology. (2016).

X Zhan S Zhang WY Szeto X Chen 2020 Multi-step-ahead traffic speed forecasting using multi-output gradient boosting regression tree J. Intell. Transp. Syst. 24 2 125 141

Fig. 11
figure 11
MATH 

Google Scholar
 

Process of adaption in SVR.
Google Scholar
 

Discussion

X.Z.: Conceptualization, writing and review. Z.K.L.: Analysis of experimental data and writing. S.G.: Research resource collection and experiment design. All authors reviewed the manuscript.Zhang, X., Lin, Z. & Gu, S. A machine learning model the prediction of athlete engagement based on cohesion, passion and mental toughness.
Sci Rep 15, 3220 (2025). https://doi.org/10.1038/s41598-025-87794-yYe, L. The Impacting Factors and Mechanism of Athlete Engagement. (D. Central China Normal University, 2014).MATH 

Google Scholar
 
Radar plot of overall evaluation level of different machine learning algorithms.

Limitations and future research

In this model, the weight vector (omega) can be obtained through the “coef_” attribute of the model, while the intercept (b) is obtained through the “intercept_” attribute. And LR models usually don’t require much tuning of the hyperparameters; the performance of a linear model depends largely on the degree to which the problem or data follow a linear distribution46. If the data do exhibit a linear relationship, then LR models tend to provide better predictions. However, if the true distribution of the data is nonlinear, more complex models may need to be considered, or appropriate preprocessing of the data might be required to improve the performance of the LR model.You can also search for this author in
PubMed Google Scholar

Conclusion

Z Zhongqiu 2012 Research, development and application of sports psychology in the field of competitive sports J. Tianjin Univ. Phys. Educ. 27 3 7