An Intelligent Decision-making Scheme in a Dynamic Multi-objective Environment using Deep Reinforcement Learning

Hasan, Md Mahmudul (2020) An Intelligent Decision-making Scheme in a Dynamic Multi-objective Environment using Deep Reinforcement Learning. Doctoral thesis, Anglia Ruskin University.

[img]
Preview
Text
Accepted Version
Available under the following license: Creative Commons Attribution Non-commercial No Derivatives.

Download (7MB) | Preview

Abstract

Real-life problems are dynamic and associated with a decision-making process with multiple options. We need to do optimisation to solve some of these dynamic decision-making problems. These problems are challenging to solve when we need trade-off between multiple parameters in a decision-making process, especially in a dynamic environment. However, with the help of artificial intelligence (AI), we may solve these problems effectively. This research aims to investigate the development of an intelligent decision-making scheme for a dynamic multi-objective environment using deep reinforcement learning (DRL) algorithm. This includes developing a benchmark in the area of dynamic multi-objective optimisation in reinforcement learning (RL) settings, which stimulated the development of an improved testbed using the conventional deep-sea treasure (DST) benchmark. The proposed testbed is created based on changing the optimal Pareto front (PF) and Pareto set (PS). To the best of my knowledge, this is the first dynamic multi-objective testbed for RL settings. Moreover, a framework is proposed to handle multi-objective in a dynamic environment that fundamentally maintains an equilibrium between different objectives to provide a compromised solution that is closed to the true PF. To proof the concept, the proposed model has been implemented in a real-world scenario to predict the vulnerable zones based on the water quality resilience in São Paulo, Brazil. The proposed algorithm namely parity-Q deep Q network (PQDQN) is successfully implemented and tested where the agent outperforms in terms of achieving the goal (i.e. obtained rewards). Though, the agent requires higher elapsed time (i.e. the number of steps) to be trained compared to the multi-objective Monte Carlo tree search (MO-MCTS) agent in a particular event, its accuracy in finding the Pareto optimum solutions is significantly enhanced compared to the multi-policy DQN (MPDQN) and multi-Pareto Q learning (MPQ) algorithms. The outcome reveals that the proposed algorithm can find the optimum solution in a dynamic environment. It allows a new objective to accommodate without any retraining and behaviour tuning of the agent. It also governs the policy that needs to be selected. As far as the dynamic DST testbed is concerned, it will provide the researchers with a new dimension to conduct their research and enable them to test their algorithms in solving problems that are dynamic in nature.

Item Type: Thesis (Doctoral)
Keywords: Deep reinforcement learning, multi-policy, multi-objective optimisation, dynamic environment, deep Q network, vector rewards, benchmarks, water quality evaluation, resilience
Faculty: Theses from Anglia Ruskin University
Depositing User: Lisa Blanshard
Date Deposited: 16 Sep 2020 13:26
Last Modified: 09 Sep 2021 18:53
URI: https://arro.anglia.ac.uk/id/eprint/705890

Actions (login required)

Edit Item Edit Item