Description (Translated): |
This study analysed a dataset of elite youth football containing various variables, including injuryrelated data, demographic information, and performance metrics, with the goals of clarify the latest concepts and theories on sports injuries, understand the analytical approach of research in sports injuries and, most of all, explore the use of machine learning methods for injury prediction and injury risk profiling. Traditional injury prevention models are limited and the potential of machine learning to improve injury prediction and prevention depends greatly on the understanding of the complex interactions between various determinants of injury and the need for a time-sensitive approach to injury prediction. Therefore, in this study we used a methodology and analysis approach that takes into consideration the cyclical nature of shifting risk factors to produce a dynamic, recursive picture of aetiology. The methodological approach used to guide the application computer-based algorithms in the field of data mining, was the CRISP-DM, which involves six phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment. In this study we used a Stratified 10-fold cross-validation method to partition the data, and train several machine learning models, including Gradient Boost Classifier, Extreme Gradient Boosting, Decision Tree, K-Nearest Neighbours and Logistic Regression. For injury risk profiling we used k-means clustering algorithm. In this study we found that muscle injuries are the most common type of injury among elite youth football players. We found that the Gradient Boost Classifier and Extreme Gradient Boosting classifier had the highest accuracy scores in identifying injury risk, with F1-scores of 87% and 88%, respectively. We also found that total exposure time, training exposure, and match exposure were the features that most impacted an injured athlete. With this study we concluded that machine learning a promising approach for injury prediction and prevention, and we highlight the importance of considering multiple factors and their interactions in injury risk profiling. The main limitation of the study was the information contained in the dataset, which was limited in terms of time range and data dimensions. For further studies we recommend expanding data dimensions, develop and refine machine learning algorithms to improve prediction and risk profiling for specific types of injury and explore the application of these algorithms in real-world sports situations. |