Causally aware reinforcement learning agents for autonomous cyber defence

التفاصيل البيبلوغرافية
العنوان:	Causally aware reinforcement learning agents for autonomous cyber defence
المؤلفون:	Tom Purves, Kostas Kyriakopoulos, Sian Jenkins, Iain Phillips, Tim Dudman
سنة النشر:	2024
المجموعة:	Loughborough University: Figshare
مصطلحات موضوعية:	Commerce, management, tourism and services, Information and computing sciences, Artificial intelligence, Data management and data science, Machine learning, Psychology, Autonomous cyber defence, Reinforcement learning, Casual inference, Structural Causal Model
الوصف:	Artificial Intelligence (AI) is seen as a disruptive solution to the ever increasing security threats on network infrastructures. To automate the process of defending networked environments from such threats, approaches such as Reinforcement Learning (RL) have been used to train agents in cyber adversarial games. One primary challenge is how contextual information could be integrated into RL models to create agents which adapt their behaviour to adversarial posture. Two desirable characteristics identified for such models are that they should be interpretable and causal.To address this challenge, we propose an approach through the integration of a causal rewards model with a modified Proximal Policy Optimisation (PPO) agent in Meta’s MBRL-Lib framework. Our RL agents are trained and evaluated against a range of cyber-relevant scenarios in the Dstl YAWNING-TITAN (YT) environment. We have constructed and experimented with two types of reward functions to facilitate the agent’s learning process. Evaluation metrics include, among others, games won by the defence agent (blue wins), episode length, healthy nodes and isolated nodes.Results show that, over all scenarios, our causally aware agent achieves better performance than causally-blind state-of-the-art benchmarks in these scenarios for the above evaluation metrics. In particular, with our proposed High Value Target (HVT) rewards function, which aims not to disrupt HVT nodes, the number of isolated nodes is improved by 17% and 18% against the model-free and Neural Network (NN) model-based agents across all scenarios. More importantly, the overall performance improvement for the blue wins metric exceeded that of model-free and NN model-based agents by 40% and 17%, respectively, across all scenarios.
نوع الوثيقة:	article in journal/newspaper
اللغة:	unknown
Relation:	2134/27014740.v1; https://figshare.com/articles/journal_contribution/Causally_aware_reinforcement_learning_agents_for_autonomous_cyber_defence/27014740
الاتاحة:	https://figshare.com/articles/journal_contribution/Causally_aware_reinforcement_learning_agents_for_autonomous_cyber_defence/27014740
Rights:	CC BY 4.0
رقم الانضمام:	edsbas.F24ADF30
قاعدة البيانات:	BASE

View record in BASE

الوصف
الوصف غير متاح.